Big-Data-and-Information-System-IS-669

The objective of this course is to introduce students to Data Science approaches to mine large amounts of information, the necessary tools, and learn from real use cases what is necessary for a company to create Big Data Centers of Excellence in order to successfully turn data analysis into competitive advantage.


Project maintained by akshayjadhav21 Hosted on GitHub Pages — Theme by mattgraham

Airline Big Data Management

Introduction

The repository introduces a project of Airline Big Data Management using the dataset of Airline Carriers on EC2 Server (AWS Service) to explore the big data management on cloud computing platform. Also, project phases includes the various tasks such as loading big data using Pig or Hive in HDFS, data analysis and visualization of airline delay data to find valuable insights, building best predictive or machine learning algorithms to identify delayed airline carriers to further provide best plan of action to optimize the delay intervals based on predictive results.

Machine Learning Algorithms

  1. Support Vector Machine (SVM)
  2. Decision Tree
  3. Random Forest
  4. Naive bayes
  5. KNN
  6. Logistic Regression

Accuracy Results Comparison

Accuracy Results

Platforms

EC2 Server(AWS), Jupyter Notebooks

Dataset Reference

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HG7NV7