Durată: 5 zile
Certificare: Diploma de participare

Pentaho Kettle Fundamentals (3Days) & Advanced (Additional 2 Days)
Day 1: Introduction & Core Concepts
• Course overview, objectives, and environment setup
• Introduction to PDI: architecture and components (Spoon, Pan, Kitchen, Carte)
• Navigating the Spoon UI: connections, repositories, and project structure
• Transformations fundamentals: steps, hops, and data flow
• Core input/output steps: CSV Input, Excel Input, Table Input, Text File Output, Table Output
• Lab: Build a simple CSV → Database transformation
Day 2: Data Transformation & Jobs
• Data transformation steps: Filter Rows, Sort Rows, Merge Join, Lookup steps
• String manipulation, data type conversion, and value mapping
• Introduction to Jobs: job entries, flows, success/failure routing
• Variables, parameters, and environment configuration
• Lab: Build a job that orchestrates multiple transformations with error handling
Day 3: Advanced Topics & Real-World Patterns
• Working with databases: connections, bulk loading, slowly changing dimensions (SCD)
• Executing PDI from the command line: Pan and Kitchen, scheduling with cron
• Logging, monitoring, and error handling best practices
• Performance tuning: row buffering, parallelism, partitioning
• Lab: End-to-end ETL pipeline (file ingestion → transform → load → job orchestration)
• Q&A and recap
Day 4: Intermediate–Advanced Transformations
• Stream Lookup vs. Database Lookup: performance trade-offs and use cases
• Advanced join patterns: Sorted Merge Join, fuzzy matching, and multi-stream merging
• Working with JSON and XML: parsing, generating, and transforming semi-structured data
• Dynamic SQL and parameterized queries in Table Input
• Using the JavaScript step and Formula step for complex business logic
• Handling large datasets: lazy conversion, compression, and streaming optimizations
• Lab: Build a transformation that processes nested JSON and loads a normalized relational model
Day 5: Orchestration, Deployment & Integration
• Advanced job design: sub-jobs, parallel execution, and dynamic file handling
• Metadata injection: building dynamic, reusable transformation templates
• Connecting to REST APIs and web services: HTTP Client, REST Client steps
• Integrating PDI with messaging systems (Kafka, JMS) and cloud storage (S3, SFTP)
• Deploying PDI on Carte server: remote execution and clustering basics
• CI/CD for PDI: version control, automated testing with PDI unit test framework
• Lab: Design and deploy a fully parameterized, scheduled pipeline with API ingestion, transformation logic, and database output
• Q&A and recap
02JK