Enrichment of Influencer Database

Table of Contents

*
*

1. Result

The result is an enriched database that reduces the manual work by six times.
The rate of finding good candidates went from 5/hour to 30/hour.

  • Final enriched data: Link
    Snippet of the data

  • The full data enrichment process: Link

2. Background

I was, indirectly, working for an influencer marketing agency, and I automate a manual task by enriching a database.
My "task" was observing channel list from a platform called Noxinfluencer, and do a manual check for each channel.
Once a channel meet the criteria, I had to manually copy some information to a master Google Sheet.

  • The criteria were as follows:
    • Good indicators:
      • Content in English language [Covered by NoxInfluancer]
      • Creator live in Anglos countries [Covered by NoxInfluancer]
      • Channel have at least 10,000 subscribers [Covered by NoxInfluancer]
      • Content contain "talks"
      • Average videos' view >=10,000 views
      • Average videos' duration >= 10 minutes
      • Average upload period < 6 months
    • Bad indicators:
      • Content is music only [Covered by NoxInfluancer (sometimes)]
      • Content is meant for kids
  • Element to manually capture if channel meet the criteria:
    • Channel's Name
    • Channel's URL
    • Channel's Category/Keyword
    • Channel's Average video views (in thousands)
    • Channel's Videos count
    • Creator's Email (If found in channel's description)

Example of the Master Google Sheet:

Name URL Category AVG views K Email Video count
A Walk on the Wild Side URL VLOG Tourism Entertainment 9   539
raimi reyes URL Life Style Beauty raimi 9 raimi@gleamfutures.com 149
elanna pecherle URL Beauty Makeup Film & Animation 8 jessica@collabagency.com 593
Milk Man Steve URL Gaming Action-adventure 5 oofgangfire@gmail.com 94

3. Plan

  • Gather all YouTubers' channels links from Noxinfluencer after adding the four basic filters.
  • Gather all data from the last 20 videos using a library called Pytube (doesn't work anymore after YouTube updated their API).
    • Reporting requirement: Email by analyzing the text in description and about me page.
    • Criteria 4&5: Analyzing the description of each video for words such as short film, hip hop, ASMR, AMV, Fortnite, Minecraft, and Roblox.
    • Criteria 6: Using two metrics I developed myself to helps determine if there is speech throughout the video.
      • Narative score \[\text{Video natarive score}=\frac{\text{Subtitles' lenght}}{\text{Video durration}}\] \[\text{Channel natarive score}=\frac{\sum_{1^{st}\text{video}}^{20^{th}}\text{Video natarive score}}{20}\]
      • Narative probability \[\text{Videos' narration}= 1 \text{ if auto-subtitle exist, else }0\] \[\text{Channel narative probability}=\frac{\sum_{1^{st}\text{video}}^{20^{th}}\text{Videos' narration}}{20}\]
    • Criteria 7: The average views from the last 20 videos.
    • Criteria 8: The average video duration from the last 20 videos.
    • Criteria 9: Wasn't possible due to the Google API limitation.
  • Developing a score that prioritize channels for manual reviewes \[ Score=\frac{\log{_\text{video count}}* \log{\bar{x}_\text{views}}*\log{\bar{x}_\text{lengh}}*\log{_\text{Channel natarive score}}}{\log{_\text{subs}}} * P(\text{Worth}) * P(\text{Channel narative}) \]

Date: 2023-03-08 Wed 20:54

Created: 2025-08-16 Sat 18:30

Validate