Overview

Github:

https://github.com/kimsup10/octopus This is a web application to analyze data originating form Instagram.

It helps these two:

• Informing an Instagram user of taste group of his/her followers
• Predicting how many likes he/she will receive for an post.

Technical Stack • numpy
• pickle
• Konlpy
• dj3.js
• Selenium
• ResNet

Contribution

Oops, I can see that I’m out of contributors list on Github. It was because of my github account setting, you can identify my contribution in commit history.

Project Detail

Data Collection

Crawling target user’s whole paged Instagram page for an initial request using selenium web driver and cache it on disk using pickle.

Preprocessing

• Used Konlpy – mecab to perform morpheme analysis on posted articles. Only the words tagged with nouns are judged to be a specific keyword rather than the grammatical elements of the article that affect the number of likes.
• Calculated the conditional probabilities to be used for the nib base probability based on features analyzed in noun units.
• Use Keras’ pre-trained ResNet model to get a list of objects and animals in the posting photos and use them as features.

Modeling

Taste group clustering: K-Means clustering

K
Set the square root of the number of engaged users that are the subject of clustering to the number of clusters K of K-means clustering.

Distance
Distance(A, B)=HammingDistance(VA, VB)
But. VA={wher User A likes Post i}
=<0, 1, 1,0, 0, 0, 0, 1>

Likes prediction: Regression with Naive bayes

We use a mixture of Regression Model and Naive Bayes model to predict the number of likes. Naive bayes calculates the likelihood that a user will click on the likes of a post containing a specific word and object picture through probability calculation, and then multiplies this probability by the parameter to regress the likelihood of an expectation. Evaluation

• R-Square: 0.36853623771555538
• MSE: 192.87811680608883