Twitter Streaming Language Classifier

来源 1 浏览 920 扫码分享 2018-04-15 22:39:47

Twitter Streaming Language Classifier

Scrape/collect a dataset.

Clean and explore the data, doing feature extraction.
Improve the model using more and more data, perhaps upgrading your infrastructure to support building larger models. (Such as migrating over to Hadoop.)
Apply the model in real time.

- Spark SQL is used to examine the dataset of Tweets. Then Spark MLLib is used to apply the K-Means algorithm to train a model on the data.
Apply the Model in Real-time - Spark Streaming and Spark MLLib are used to filter a live stream of Tweets for those that match the specified cluster.

本文档使用 BookStack 构建

展开/收起文章目录