Event box

CANCELLED: Part 1 of 2: Accelerating data engineering pipelines (TrAC)
This two-part workshop will dive deeper into into engineer datasets using the ETL (Extract, Transform, Load) pipeline.
Data engineering is the foundation of data science and lays the groundwork for analysis and modeling. In order for organizations to extract knowledge and insights from structured and unstructured data, fast access to accurate and complete datasets is critical. Working with massive amounts of data from disparate sources requires complex infrastructure and expertise. Minor inefficiencies can result in major costs, both in terms of time and money, when scaled across millions to trillions of data points. Each session is roughly divided into 3 hours of active teaching and 1 hour of extra Q&A.
The concepts below will spread over the two sessions:
- How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
- How different file formats can be read and manipulated by hardware.
- How to scale an ETL pipeline with multiple GPUs using NVTabular.
- How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.
Upon successful completion of the assessment, you’ll receive an NVIDIA certificate.
Important
This is a two-part course.
Make sure you can attend all three sessions before you register.
A laptop is required to participate. Students can borrow a laptop through the library’s Tech Lending program.
Prerequisites:
- Intermediate knowledge of Python (list comprehension, objects)
- Familiarity with pandas a plus
- Introductory statistics (mean, median, mode)
- Date:
- Thursday, April 10, 2025
- Time:
- 1:00pm - 5:00pm
- Location:
- The Catalyst (Parks 199)
- Audience:
- Faculty Grad students & postdocs ISU staff Undergrads
- Categories:
- Workshop > The Catalyst