Ceph RGW Internals: Cache Coherence & Bucket Life Cycles

RGW Cache Coherence

Why RGW have a control pool? We will try to understand its use case and purpose in RGW for cache synchronization.

RGWRados::init_watch()

Creates watcher objects in RGW control pool

$ sudo rados ls -p .in-abc-1.rgw.control
notify.1
notify.6
notify.3
notify.7
notify.2
notify.4
notify.5
notify.0

The common assumption is that these objects are watched for any change and the threads sync their caches if there is a change on these objects.

RGWObjTags

class RGWPutLC: public RGWOp
- RGW_OP_PUT_LC
- RGW_OP_TYPE_WRITE
class RGWPutLC_ObjStore_S3: public RGWPutLC_ObjStore

    RGWHandler_REST_Bucket_S3::op_put()
      if(is_lc_op()) {
        return new RGWPutLC_ObjStore_S3;
class RGWHandler_REST_Bucket_S3 : public RGWHandler_REST_S3 {
protected:
  bool is_acl_op() {
    return s->info.args.exists("acl");
  }
  bool is_cors_op() {
      return s->info.args.exists("cors");
  }
  bool is_lc_op() {
      return s->info.args.exists("lifecycle");
  }
  bool is_obj_update_op() override {
    return is_acl_op() || is_cors_op();
  }
  bool is_request_payment_op() {
    return s->info.args.exists("requestPayment");
  }
  bool is_policy_op() {
    return s->info.args.exists("policy");
  }

RGW OP Handler

RGWOp* RGWHandler_REST::get_op(RGWRados* store)
{
  RGWOp *op;
  switch (s->op) {
   case OP_GET:
     op = op_get();
     break;
   case OP_PUT:
     op = op_put();
     break;
   case OP_DELETE:
     op = op_delete();
     break;
   case OP_HEAD:
     op = op_head();
     break;
   case OP_POST:
     op = op_post();
     break;
   case OP_COPY:
     op = op_copy();
     break;
   case OP_OPTIONS:
     op = op_options();
     break;
   default:
     return NULL;
  }

  if (op) {
    op->init(store, s, this);
  }
  return op;
} /* get_op */

Takes an exclusive lock on the LC object for a given bucket shard. Next, it sets OID in OMAP Uses: rgw_cls_lc_set_entry()

RGWLC Invocation

Entry for LC: RGWRados::init_complete()

This function reads zone and zone group config and creates
connection to zone endpoint. It also creates io context with
root, GC, LC, objexp (log) and reshard pool.
GC used the RGWObjectExpirer object which uses the objexp (log) pool.

 lc = new RGWLC();
 lc->initialize(cct, this);

 if (use_lc_thread)
 lc->start_processor();

RGWLC class has LCWorker class

void RGWLC::initialize(CephContext *_cct, RGWRados *_store)

- creates LC object names as lc.0, lc.1,...,lc.31
- creates a cookie buffer
void RGWLC::start_processor()
  • Spawns LCWorker threads
  • Each thread calls lc->process()
lc->process()

// src/cls/rgw/cls_rgw_const.h
// The above file has CLS functions of RGW.

It stores the operation meta in struct cls_rgw_lc_obj_head.
This structure has two fields: time and a marker string.

The list of objects is retrieved from OMAP in rgw_cls_lc_list_entries()
The input is op (cls_rgw_lc_list_entries_op op) marker, filter prefix
and max entries. The function rgw_cls_lc_list_entries() get the list.
MAX_LC_LIST_ENTRIES in one read is 100.

The list entries get the bucket ID and for each entry bucket state
is set to uninitial in OMAP.

RGWLC::bucket_lc_process(string& shard_id)

Shard ID has tenant, bucket name and bucket ID.
A bucket must have RGW_ATTR_LC set for LC processing.

BlinkBlank

Knowledge is the seed of wisdom.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.