Octopus, Predicting and Analyze Instagram

Overview

Demo video

Github:

https://github.com/kimsup10/octopus

Paul, Octopus

스크린샷 2017-09-01 오전 1.22.56

 

This is a web application to analyze data originating form Instagram.

It helps these two:

  • Informing an Instagram user of taste group of his/her followers
  • Predicting how many likes he/she will receive for an post.

Technical Stack

  • PythonFlask
    • numpy
    • pickle
    • Konlpy
  • dj3.js
  • Selenium
  • ResNet

Contribution

Oops, I can see that I’m out of contributors list on Github. It was because of my github account setting, you can identify my contribution in commit history.

Project Detail

Data Collection

Crawling target user’s whole paged Instagram page for an initial request using selenium web driver and cache it on disk using pickle.

Preprocessing

  • Used Konlpy – mecab to perform morpheme analysis on posted articles. Only the words tagged with nouns are judged to be a specific keyword rather than the grammatical elements of the article that affect the number of likes.
  • Calculated the conditional probabilities to be used for the nib base probability based on features analyzed in noun units.
  • Use Keras’ pre-trained ResNet model to get a list of objects and animals in the posting photos and use them as features.

Modeling

Taste group clustering: K-Means clustering

K
Set the square root of the number of engaged users that are the subject of clustering to the number of clusters K of K-means clustering.

Distance
Distance(A, B)=HammingDistance(VA, VB)
    But. VA={wher User A likes Post i}
=<0, 1, 1,0, 0, 0, 0, 1>

Likes prediction: Regression with Naive bayes

We use a mixture of Regression Model and Naive Bayes model to predict the number of likes. Naive bayes calculates the likelihood that a user will click on the likes of a post containing a specific word and object picture through probability calculation, and then multiplies this probability by the parameter to regress the likelihood of an expectation.

스크린샷 2017-09-01 오전 12.53.22

Evaluation

  • R-Square: 0.36853623771555538
  • MSE: 192.87811680608883

답글 남기기

아래 항목을 채우거나 오른쪽 아이콘 중 하나를 클릭하여 로그 인 하세요:

WordPress.com 로고

WordPress.com의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Google photo

Google의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Twitter 사진

Twitter의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

Facebook 사진

Facebook의 계정을 사용하여 댓글을 남깁니다. 로그아웃 /  변경 )

%s에 연결하는 중

search previous next tag category expand menu location phone mail time cart zoom edit close