Big Data Analytics (21CSH-471)
Understanding Big Data and the 5 V’s
- Introduction to Big Data – Definition and Characteristics;
- The 5 V’s of Big Data
- Volume: Data at scale,
- Velocity: Real-time data processing,
- Variety: Structured, semi-structured, unstructured data,
- Veracity: Uncertainty and trustworthiness in data,
- Value: Transforming data into insights; Challenges and Opportunities in Big Data; Big Data Use Cases in Real-World Applications
Big Data Architecture
- Fundamentals of Big Data Architecture:
- Data ingestion
- Storage
- Processing and visualization layers
- Hadoop Ecosystem in Big Data Architecture:
- Tools like HDFS, YARN, Hive and Sqoop
- Streaming Data in Big Data:
- Tools such as Apache Kafka and Flink
- Real-World Big Data Architecture:
- Lambda and Kappa Architectures,
- Hybrid Architecture for batch and real-time processing
The Hadoop Ecosystem
- Introduction to the Hadoop Ecosystem
- HDFS (Hadoop Distributed File System):
- Architecture and Functionality
- MapReduce Programming Model:
- Workflow and Applications
- YARN (Yet Another Resource Negotiator):
- Resource Management
- Tools in the Ecosystem:
- Pig, HBase, Flume, and Oozie
- Data Processing with Hadoop:
- ETL, Analytics and Reporting
Data Visualization (21CSH-461)
Chapter 1: Data Handling and Introduction to Visualization
- Data extraction, cleaning, and annotation