Challenges Of Current Analytical Architecture for Data Scientists

Jayanth Jadhav
2 min readDec 6, 2023

--

Typical Analytic Architecture

The key aspects of the traditional data architecture are:

1. Enterprise data warehouses (EDWs) hold the central, validated datasets needed for critical business operations and reporting.

2. Loading data into EDWs requires extensive preprocessing, structuring, and conforming to schemas. This facilitates security, backups etc.

3. The rigorous EDW process limits quick iterative analysis or exploration of new data sources.

4. Additional departmental data marts often emerge to enable more ad hoc analytics uses by business teams. These reside in isolation.

5. Many applications draw data from the EDW for company-wide BI, reporting and dashboards to power operational decisions.

6. For advanced analysis, analysts get extracts from the EDW, moved in batches to local analytical tools.

7. These local analytics are typically done on desktops, in memory, analyzing just samples of data rather than full datasets.

8. Insights and findings from local analysis are rarely fed back or integrated into the EDW and upstream flows.

The EDW and downstream efforts optimize for structured reporting vs empowering exploratory ad hoc analysis. Data for the latter use case faces many restrictions.

The Challenges are :

1. Data enters enterprise data warehouses (EDWs) only after extensive preprocessing for structure, validation, security. This limits flexibility.

2. EDWs enable mission critical operations, reporting, BI with rigorous governance. But this constrains exploratory analysis.

3. Due to EDW limitations, departmental data marts proliferate. These enable more analysis but reside in silos, unintegrated.

4. EDW data feeds downstream reporting and BI systems which are organization-wide critical processes.

5. Analysts get provisioned extracts from EDW for offline analytics. But tools are limited in scale and rely on samples.

6. Analytic results rarely get fed back into the main data flows or EDW due to disconnect.

7. The batched movement into EDW and out to analysts, combined with governance focused on structured operations rather than exploration, creates barriers for data scientists to access data at scale.

In summary — data enters EDW in batches through extensive validations focused on security, operations then propagates downstream to reporting systems before reaching analysts only through limited extracts. This workflow is optimized for governance, operations and business continuity over empowering analysts.

--

--