The objective of this course is to introduce students to Data Science approaches to mine large amounts of information, the necessary tools, and learn from real use cases what is necessary for a company to create Big Data Centers of Excellence in order to successfully turn data analysis into competitive advantage.
The repository introduces a project of Airline Big Data Management using the dataset of Airline Carriers on EC2 Server (AWS Service) to explore the big data management on cloud computing platform. Also, project phases includes the various tasks such as loading big data using Pig or Hive in HDFS, data analysis and visualization of airline delay data to find valuable insights, building best predictive or machine learning algorithms to identify delayed airline carriers to further provide best plan of action to optimize the delay intervals based on predictive results.
EC2 Server(AWS), Jupyter Notebooks
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HG7NV7