npm install
nest generate resource
npm run start
npm run start:dev
npm run start:prod
npm run test
npm run test:e2e
npm run test:cov
Nest framework TypeScript starter repository.
http://localhost:3000/process-data <-- parses the real-time trends and saves them to a file.
http://localhost:3000/predictions <-- loads daily trends and compares to the above
GET http://localhost:3000/logistic-regression <-- train the logistic regression model on data loaded from files
POST http://localhost:3000/logistic-regression/train <-- train the dataset from the payload body
POST http://localhost:3000/logistic-regression/predict <-- predict a major trend from the payload
We get the results of two data sources and merge then together for the model training data.
These two sources are the arguments to the processData(realTimeTrendsData: any, realTimeTrendsPageData: any)
function.
The idea is to retrieve real-time trends from the Google Trends API and use a machine learning model to predict if a search trend will become a major trend with over 500,000 searches.
We'll need to gather relevant data for training the model, fine-tune the model, and perform testing and validation to achieve accurate predictions.
Below is an outline of the data gathering and creation process.
Collect data from the Google Trends API (realTimeTrendsData) and the web page (realTimeTrendsPageData). Match the trends in both data sources to create a unified dataset.
Clean the data: Handle missing values, remove duplicates, and ensure consistency. Extract relevant features from both sources. Some possible features include:
- Trend title
- Entity names
- URLs of news articles or sources
- Image URLs (if needed for any analysis)
- Trending status (whether it crossed 500,000 searches or not)
For the unified dataset, we'll need to label each trend to indicate whether it became a major trend (with over 500,000 searches) or not. This information can be gathered using the dailyTrends API once a day.
Store the preprocessed and labeled data in a structured format, such as a CSV file or a database, for easy retrieval during model training. We will start with json files on the local for now.
Split the dataset into training and testing sets. Train the Decision Tree Classifier on the training set using the relevant features as input and the trend's status (major trend or not) as the output label.
Evaluate the trained model's performance on the testing set to assess its accuracy and other relevant metrics.
Once the model is trained and evaluated, we can use it to make predictions on new data.
The dataset should contain rows, where each row represents a trend, and columns represent features and labels. Here's how a brief snapshot of the dataset might look:
--------------------------------------------------------------------
| Trend Title | Entity Names | IsMajorTrend |
--------------------------------------------------------------------
| Chelsea F.C. | Football, Sports | 1 |
| Quantum Computing | Technology, Science | 0 |
| Travis Scott | Music, Entertainment | 1 |
| ... | ... | ... |
--------------------------------------------------------------------
In this example, the dataset includes a "Trend Title" column representing the title of each trend, an "Entity Names" column containing related entities, and an "IsMajorTrend" column (the label) with a value of 1 if the trend became a major trend and 0 otherwise.
Gather historical search trend data for multiple search trends. For each search trend, collect features that might be relevant for prediction. Features could include attributes like the number of searches at different time intervals, the trend's category, previous trend performance, related news articles, etc.
Using the google trends api to get current real-time trending data.
Example results:
{
"featuredStoryIds":[
],
"trendingStoryIds":[
"US_lnk_7E8AhgAAAADsSM_en",
"US_lnk_VuUchwAAAABK4M_en",
"US_lnk_6PschwAAAAD0_M_en",
...
"US_lnk_ktXjhgAAAABx0M_en",
"US_lnk_csIrhwAAAABZxM_en",
"US_lnk_YcIFhgAAAABkxM_en"
],
"storySummaries":{
"featuredStories":[
],
"trendingStories":[
{
"image":{
"newsUrl":"https://panhandle.newschannelnebraska.com/story/49213002/severe-thunderstorm-warning-northeastern-cheyenne-county",
"source":"PANHANDLE - NEWS CHANNEL NEBRASKA",
"imgUrl":"https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcREw9IokIqdJMR1JeZyS1njGN5hUc13Di78P0Y9wRiv9w8cTcko5_eUYUCu-FQ"
},
"shareUrl":"https://trends.google.com/trends/trendingsearches/realtime?id=US_lnk_7E8AhgAAAADsSM_en&category=all&geo=US#US_lnk_7E8AhgAAAADsSM_en",
"articles":[
{
"articleTitle":"Severe thunderstorm warning: northeastern Cheyenne County",
"url":"https://panhandle.newschannelnebraska.com/story/49213002/severe-thunderstorm-warning-northeastern-cheyenne-county",
"source":"PANHANDLE - NEWS CHANNEL NEBRASKA",
"time":"2 days ago",
"snippet":"IMPACT...People and animals outdoors will be injured. Expect hail damage to \nroofs, siding, windows, and vehicles. Expect wind"
}
],
"idsForDedup":[
"/m/01p90g /m/01q4np",
"/m/01p90g /m/0jb2l",
"/m/01q4np /m/0jb2l"
],
"id":"US_lnk_7E8AhgAAAADsSM_en",
"title":"Thunderstorm, Severe thunderstorm warning, National Weather Service",
"entityNames":[
"Thunderstorm",
"Severe thunderstorm warning",
"National Weather Service"
]
},
{
"image":{
"newsUrl":"https://hypebeast.com/2023/7/travis-scott-bad-bunny-the-weeknd-new-single-kpop-release-date-info",
"source":"Hypebeast",
"imgUrl":"https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQS7TmScXGUF7u2R6cDt4MgoqSjX0S0r0AN58FIbLyucXpQ4F3Bw12ADEkAEec"
},
"shareUrl":"https://trends.google.com/trends/trendingsearches/realtime?id=US_lnk_rTjvhgAAAABCPM_en&category=all&geo=US#US_lnk_rTjvhgAAAABCPM_en",
"articles":[
{
"articleTitle":"Travis Scott New Single \"KPOP\" Release Info",
"url":"https://hypebeast.com/2023/7/travis-scott-bad-bunny-the-weeknd-new-single-kpop-release-date-info",
"source":"Hypebeast",
"time":"1 day ago",
"snippet":"Travis Scott continues to kick off his UTOPIA rollout with the announcement \nof the first single “KPOP.” The forthcoming track is set to..."
},
...
],
"idsForDedup":[
"/g/11gdq15782 /m/02yh8l",
"/g/11gdq15782 /m/0gjdn4c",
"/g/11gdq15782 /m/0sghzm9",
"/m/02yh8l /m/0gjdn4c",
"/m/02yh8l /m/0sghzm9",
"/m/0gjdn4c /m/0sghzm9"
],
"id":"US_lnk_rTjvhgAAAABCPM_en",
"title":"Travis Scott, Bad Bunny, The Weeknd, K-pop",
"entityNames":[
"Travis Scott",
"Bad Bunny",
"The Weeknd",
"K-pop"
]
},
{
"image":{
"newsUrl":"https://www.tomshardware.com/news/russian-company-presents-16-qubit-quantum-computing-qpu-to-vladimir-putin",
"source":"Tom's Hardware",
"imgUrl":"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRRbDdBnJvfmPJhxKh8Q7vlxcHaybdpXSneNIjUOISidl38A_7Vp9Dc_M_FS8A"
},
"shareUrl":"https://trends.google.com/trends/trendingsearches/realtime?id=US_lnk_2FkAhwAAAADYXM_en&category=all&geo=US#US_lnk_2FkAhwAAAADYXM_en",
"articles":[
{
"articleTitle":"Russian Company Presents 16-qubit Quantum Computer to Vladimir Putin",
"url":"https://www.tomshardware.com/news/russian-company-presents-16-qubit-quantum-computing-qpu-to-vladimir-putin",
"source":"Tom's Hardware",
"time":"1 day ago",
"snippet":"Rosatom at the Forum for Future Technologies in Moscow, Russia showcased \nwhat it says is a working, 16-qubit quantum computer based on..."
}
],
"idsForDedup":[
"/m/01lps /m/069kd",
"/m/01lps /m/06b3x",
"/m/01lps /m/06bnz",
"/m/01lps /m/08193",
"/m/069kd /m/06b3x",
"/m/069kd /m/06bnz",
"/m/069kd /m/08193",
"/m/06b3x /m/06bnz",
"/m/06b3x /m/08193",
"/m/06bnz /m/08193"
],
"id":"US_lnk_2FkAhwAAAADYXM_en",
"title":"Quantum computing, Russia, Computing, Vladimir Putin, Qubit",
"entityNames":[
"Quantum computing",
"Russia",
"Computing",
"Vladimir Putin",
"Qubit"
]
}
]
},
"date":"Jul 21, 2023",
"hideAllImages":false
}
This will be parsed using Cheerio to get the trend titles and the graph of search volume over time.
This data can be used to mark the above data as either major trend or not. If the saved titles appear on the daily trends list, then they did indeed become major searches.
If they do not appear on the list, then the are marked as not major trends.
Clean and pre-process the collected data to make it suitable for training the machine learning model. This may involve handling missing values, encoding categorical features, scaling numerical features, and splitting the data into training and testing sets.
Instead of using tensore-flow, isn't there a mathematical way to take sparkline data like this and determine programmatically if a trend has incremental growth potential?
One common approach is to use time series analysis techniques to identify patterns and trends in the data. Here's a high-level outline of how we can approach this programmatically:
Data Preparation: Extract the data points from the element and convert them into a time series dataset. In our case, it seems like the data points are represented as a series of (x, y) coordinates in the SVG path. We need to parse the coordinates and create a time series with the appropriate time intervals (assuming each point represents a specific time).
Time Series Analysis: Once we have the time series data, we can perform various time series analysis techniques to determine if the trend has incremental growth potential. Some common methods include:
Moving Averages: Calculate the moving average of the time series to smooth out noise and identify trends. Trend Decomposition: Decompose the time series into its components like trend, seasonality, and residuals to analyze the trend separately. Autocorrelation: Check for any significant correlation between the time series and its past lags, which indicates persistence in the trend. Incremental Growth Potential Assessment: Based on the analysis from the time series techniques, we can programmatically determine if the trend exhibits incremental growth potential. Look for characteristics such as steady upward movement, increasing trend values over time, and minimal or no significant fluctuations.
Threshold Determination: To identify if the trend has incremental growth potential, we might need to define specific thresholds or rules based on the domain or problem requirements. For example, we could set a minimum percentage increase over a specific time period to classify a trend as having incremental growth potential.
Alerting Mechanism: Depending on our use case, we can implement an alerting mechanism to trigger notifications or actions when a trend is identified as having incremental growth potential. This can be useful for real-time monitoring or decision-making purposes.
It's important to note that time series analysis can be complex, and the best approach will depend on the nature of our data and the specific problem we are trying to solve. Additionally, while mathematical and statistical methods can provide valuable insights, they might not capture all aspects of more complex trends. In such cases, machine learning techniques like those offered by TensorFlow could be beneficial.
Keep in mind that analyzing and predicting trends in real-world data can be challenging, and it's often a combination of multiple approaches and domain expertise that yields the most accurate results.
relatedQueries: Users searching for our term also searched for these queries. The following metrics are returned:
Top - The most popular search queries. Scoring is on a relative scale where a value of 100 is the most commonly searched query, 50 is a query searched half as often, and a value of 0 is a query searched for less than 1% as often as the most popular query. Rising - Queries with the biggest increase in search frequency since the last time period. Results marked "Breakout" had a tremendous increase, probably because these queries are new and had few (if any) prior searches.
Here is the call to get this data.
return googleTrends.dailyTrends(
{
trendDate: new Date(),
geo: 'US',
},
function (err, results) {
if (err) {
return err;
} else {
const defaultObj = JSON.parse(Object(results)).default
.trendingSearchesDays[0].trendingSearches;
return defaultObj;
}
}
);
It will be used to annotate the data with a IsMajorTrend boolean.
npm install
# development
$ npm run start
# watch mode
$ npm run start:dev
# production mode
$ npm run start:prod
# unit tests
$ npm run test
# e2e tests
$ npm run test:e2e
# test coverage
$ npm run test:cov
Nest is an MIT-licensed open source project. It can grow thanks to the sponsors and support by the amazing backers. If you'd like to join them, please read more here.
- Author - Kamil Myśliwiec
- Website - https://nestjs.com
- Twitter - @nestframework
Nest is MIT licensed.