Brennan

NLP, yeah, you know me

Using a dataset obtained through: https://data.world/crowdflower/brands-and-product-emotions, I processed the data obtained from Twitter. Preprocessing the tweets before creating a neural network I was able predict the sentiment using term frequency vectorization. The dataset is unbalanced. When I opt for binary classification using only positive and negative sentiment categorized tweets, my data looks more like this:Continue reading “NLP, yeah, you know me”

Random Forests

XGBoost has been the leading algorithm for competitions for the past 5 years. XGBoost is an ensemble of decision trees using a gradient boosting system. The difference between XGBoost and Random Forest lies in the structure of the trees. Fully grown decision trees grow in the random forest based on subsamples of the data, growingContinue reading “Random Forests”

Analysis & Regression King County Housing Dataset

My motivation for the analysis of the King County Housing Dataset is a project for Module 2 of the Flatiron School Data Science program. I was provided a dataset with the below information included. To start off, I began researching and assessing the dataset. I began by going to the King County, Washington website, https://info.kingcounty.gov/.Continue reading “Analysis & Regression King County Housing Dataset”

Movie Industry Analysis

For the Mod 1 project, the problem posed is as follows: Your team is charged with doing data analysis and creating a presentation that explores what type of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the CEO can use when deciding whatContinue reading “Movie Industry Analysis”