Over time, organizations need to go beyond a single DB for querying and storing data to a set of DBs that cater to different business requirements. A Data Platform might comprise:
- Search Index
- A relational DB
- NoSQL DB
- Data Warehouse
Why a Data Warehouse?
It is a subject of interest to understand how the application uses the DB. The inspection can happen with a set of queries to know the DB usage. But it might affect your primary workload, so you can create isolated replica nodes for such purpose.
However, there is a time when the schema of DB data is not suitable for querying that global view
of the DB. So using an ETL pipeline, data is stored in the desired schema in a data warehouse such as S3.
Why a Search Index
Used for allowing applications to search the DB. Primarily Lucene based solution such as ES, Solr. The index is mostly eventually consistent with the DB. It is expensive to update index in the write path.