What is a Data Platform?

Over time, organizations need to go beyond a single DB for querying and storing data to a set of DBs that cater to different business requirements. A Data Platform might comprise:

  • Search Index
  • A relational DB
  • NoSQL DB
  • Data Warehouse

Why a Data Warehouse?

It is a subject of interest to understand how the application uses the DB. The inspection can happen with a set of queries to know the DB usage. But it might affect your primary workload, so you can create isolated replica nodes for such purpose.
However, there is a time when the schema of DB data is not suitable for querying that global view of the DB. So using an ETL pipeline, data is stored in the desired schema in a data warehouse such as S3.

Why a Search Index

Used for allowing applications to search the DB. Primarily Lucene based solution such as ES, Solr. The index is mostly eventually consistent with the DB. It is expensive to update index in the write path.

Advertisements

The most efficient Image format- WebP?

The most efficient Image format- WebP?

I always knew that PNG followed by JPEG were the most efficnent, lossy image compression format. However, there is a new image format, webP. It is developed by Google and comes in lossy as well as lossless format.

How is it better than PNG/JPEG?

  • The webP image is ~30% smaller in storage size than its counterparts.
  • It is supported by all major browsers.

Why is it better?

  • It uses a borrowed techniques of vide compression, VP8.
  • VP8 is a preditive lossy compression technique for intra & inter frames.
  • webP uses intra-frame compression technique as we are dealing with images.
  • A layman explanation of VP8 is that it will divide the image in a nxn matrix, each cell is called a macroblock.
  • A macroblock is checked for motion. Some blocks have no motion, such as sky. These are called ‘key’ frames.
  • A key frame checks its predicted frames and encoded as a diff.
  • The diff is quantized and encoded using a better encoding (Arithmatic Encoding) than Huffman coding used in JPEG.

Should you use webP?

  • Yes, it is well supported, always better compressed than PNG/JPEG.
  • Uses slightly more RAM to encode than PNG

References

Written with StackEdit.

Hard drives: Native Command Ordering

A simple hard drive today is capable of things that sound like some outlandish technology. Just try to do some file I/O in your application and do it with many threads.

Say you have 4 threads, A,B,C, and D. And request to do I/O comes in A then B and so on. If you check the return status of these threads, the ordering might be surprising. Thread D may return before A. How?

Disk have a technology called Native Command Ordering. So they take your request in and process them on a single, simple logic:

Serve the one which you can do fastest.

This depends on the head position of the disk. The request that can be served with minimal movement of head, is served first.