Social Media Sentiment Analysis Tool
Scalable NLP pipeline processing 500k+ posts/hour using BERT and Apache Spark with 85% accuracy.
Overview
Developed a scalable NLP pipeline capable of analyzing over 500,000 social media posts per hour in real-time. The system leveraged transformer-based models and distributed computing to detect sentiment with high accuracy.
Approach
- Data Collection: Twitter/Reddit APIs for live stream data.
- Preprocessing: Tokenization, stopword removal, lemmatization, stemming.
- Modeling: Fine-tuned BERT for sentiment polarity (positive/negative/neutral).
- Scalability: Implemented distributed processing using Apache Spark.
- Evaluation: Accuracy, precision, recall, F1-score, confusion matrix.
Results
- Achieved 85% accuracy in real-time sentiment classification.
- Successfully scaled to process 500k+ posts/hour.
- Provided actionable insights for brands monitoring public opinion.
Skills demonstrated: transformer-based NLP (BERT), big data processing (Spark), distributed systems, real-time sentiment analysis.