Big Data Analytics (21CSH-471)
The Iterative Nature of Data Science Projects
- Introduction to Data Science Projects:
- Stages and Lifecycle;
- Iterative process in Data Science:
- Problem Definition
- Data collection and exploration
- Model development and evaluation;
- Refinement and deployment;
- Importance of Iteration:
- Continuous improvement and error correction;
- Tools supporting Iteration:
- Notebooks
- Version Control
- CI/CD
Notebooks in Data Science
- Introduction to Data Science Notebooks:
- Characteristics –
- Interactive
- reproducible
- modular workflow
- Key benefits –
- Visualization
- Documentation
- Collaboration;
- Programming Languages for Data Science:
- Python – Libraries like pandas, NumPy and Matplotlib
- R – Strengths in statistical analysis and visualization;
- Mechanisms and Tolls in Notebooks:
- Code cells
- Markdown
- Widgets
- Extensions,
- Integration with Git and other data tools
Notebooks and Data Science tools in Big Data
- Major Data Science Notebooks:
- Jupyter Notebook,
- Google Colab
- Zeppelin,
- Comparing features:
- Offline vs. cloud,
- extensions
- performance;
- Getting started with Jupyter Notebook:
- Installation,
- environment setup,
- basic usage,
- Working with Python and R in Jupyter;
- Introduction to Tableau:
- Key features and use-cases,
- Data connection,
- visualization building
- Dashboard creation;
- Collaboration and Presentation tools for Data Insights
Data Visualization (21CSH-461)
Chapter 1: Programming and Tools for Statistical Data Visualization
- Java language for statistical data visualization,
- Web-based statistical graphics using XML technologies;
- Google Maps API for geographical data visualization,
- Google Chart for creating interactive charts and graphs,
- Tableau for advanced visualizations and heat map generation
Chapter 2: Rank Analysis and Trend Analysis Tools