Dataiku Demo — Customer Analytics Pipeline

End-to-end customer analytics pipeline that ingests Snowflake data into Dataiku DSS, computes RFM scores, CLV estimates, and churn risk, writes results back to Snowflake, and is mirrored on Databricks with validated parity.

Dataiku Flow

Architecture

Snowflake (DEV.DATAIKU_DEMO)
  ├── CUSTOMERS        (1,000 rows)
  └── TRANSACTIONS     (8,000 rows)
          │
          ▼  Dataiku DSS (DEMO project)
  ┌───────────────────────────────────────────┐
  │  [Shaker]  filter STATUS = 'completed'    │
  │      → transactions_completed             │
  │                                           │
  │  [Join]    LEFT JOIN on CUSTOMER_ID       │
  │      → customer_transactions_joined       │
  │                                           │
  │  [Python]  RFM + CLV + Churn analytics   │
  │      → CUSTOMER_ANALYTICS_OUTPUT          │
  └───────────────────────────────────────────┘
          │
          ▼
  Snowflake  DEV.DATAIKU_DEMO.CUSTOMER_ANALYTICS_OUTPUT
  Databricks dev.dataiku_demo.customer_analytics_output  ← migrated, parity verified

Parity validation with Datafold

Parity was validated using Datafold — a data reliability platform that runs cross-database diffs at scale using bisection hashing.

Datadiff run: https://app.datafold.com/datadiffs/13857162
Algorithm: bisection hash on CUSTOMER_ID
Result: 0 differences across all 1,000 rows

The validate_parity.py script uses the same open-source data-diff library that powers Datafold cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
databricks		databricks
dataiku		dataiku
scripts		scripts
README.md		README.md
dataiku-flow.png		dataiku-flow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataiku Demo — Customer Analytics Pipeline

Dataiku Flow

Architecture

Parity validation with Datafold

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dataiku Demo — Customer Analytics Pipeline

Dataiku Flow

Architecture

Parity validation with Datafold

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages