Design Problems of PostGres- Part I

This post is a quick summary of Why Uber moved from PostGres to MySQL.

PostGres Rows and CTID

  • PostGres provides transactions. Transactions need multiple versions of data. So PG is Multi Versioned DB.
  • PG considers each row immutable. Any change to a row creates a new row version.
  • A row is represented as an offset in disk, called ctid
  • Every row will have a unique ctid because they occupy a different space. However, multiple rows (ctid) can share same disk offset (e.g. multiple verisons of a row)
------------------
| Row 0 (ctid 1) |
------------------
                       ----------> Disk block offset x
------------------
| Row 1 (ctid 2) |
------------------

PostGres Index Management

  • Each index has key and values as CTID
  • Any change in a row’s data creates a new CTID. That would need changing all indexes.
  • This is expensive because
    • PG uses WAL technique so each write is at least twice
    • PG replicates WAL to secondary nodes.
    • The WAL has CTID and disk offset level information.
    • Replication across geographies is expensive since data volume is high.

PostGres Replication

  • At a secondary, while a transaction is in progress on a row, the WAL copy will wait.
  • If transaction runs for a long time, PG will terminate the transaction after a timeout.
  • So there are two problems:
    • Transactions can terminate unexpectedly
    • Replicas may be lagging the master more than exptected

Written with StackEdit.

Advertisements

Designing a performance stats framework

Perfromance stats are vital to understand dynamic efficiency of your code and may prove helpful to find and fix bottlenecks. This post discusses how to plug in these stats in your program.

There are primarily two classes of stats:

  • Counts
  • Time stamps

Stats are derived from events. You would be interested to know how many time an event happened, and how long did you spend to complete an operation. So, segragate your events into either “count” or “time”. Time could be computed for best, average and worst for an operation.

Next, figure out the fundamental events, e.g. “How many time printf was called” or “what is the average time my program spend in each call”? To compute latter, you need the frequency of “printf” and total time spent in all calls of printf. Such events are dependent on other events.

Design of perf stat framework need to have following phases:

  • Collection: Just collect raw data, most frequent operation.
  • Processing: Perform all calculation to compute derived events
  • Visualization

Please keep in mind that collection phase of events should be very quick. Do not perform any calculation during collection. This helps reducing intrusiveness of the event collection statements. The code to collect events should be tiny and can be inlined or #define’ed.

 

VMware player vs VirtualBox: which is better?

I have a Windows XP host machine with guest Ubuntu OS. I started with the latest VMWare player and benefits I noticed:

VMware

1. Hassle-free installation of Ubuntu (VMware provides “easy install” for Ubuntu)
2. Seamless integration between guest and host OS
o) You can copy-paste/ move files across host and guest!
o) Clipboard is shared bi-directionally

3. Very good performance: applications, network, and devices (DVD)
4. Display scales well with proper resolution on bigger screen with “VMWare tools”.
5. You can have multi-processor simulation (2 cores)

VirtualBox

1. Easy Ubuntu installation
2. Most horrible and pathetic clipboard sharing. Contrary to claimed, it provided one-way clipboard from guest to host.
3. XP clipboard stopped working and never worked till I stopped VirtualBox.
4. Supports VMDK files but it’s crappy, buggy and leaves VMDK in an un-usable state. I could never run my VMDK with VMWare later.
5. Can simulate up to 4 cores.
6. Supports shared folder between host and guest. I think it’s a generation behind what VMWare provides.
7. Very slow performance: application, system start-up/ shutdown or stand-by. This is the biggest letdown.

My verdict: VMware!

VirtualBox has miles to go and I am using VMware happily.

The best virtualization software: VirtualBox v/s VMPlayer

I was an avid user of VirtualBox and was pretty happy with it. It gave me freedom to experiment with different OS in a rather safe, transparent, and convenient way. So I used to keep Linux as host and Windows as guest. Performance was good and bugs were non-existent.

Then I found VMPlayer. It surprised me a lot with following reasons:

a) It supported ISO images for Ubuntu, my fav Linux disto.

b) Set up was easier/comparable to VirtualBox.

c) Unity mode: What a feature! A seamless integration of guest OS with host application. You don’t have to click the mouse, you never lose window focus. I can now browse through guest OS with Alt+Tab.

d) Very stable, good performance, and I am a happy customer.

e) The biggest benefit with VMPlayer is seamless file operations between host and guest. You can copy, move, delete files without any problem. VirtualBox still don’t have this feature (till Version 4+, May 2012).

f) Clipboard on VirtualBox is buggy and most of the times non-functional. VMPlayer has a reliable clipboard and you can freely move data between host and guest.