ART#214 - What are repository cache modes in ATG?





Welcome to the last, but very important topic of ATG Repositories.
Before we start, we must understand the term "cache".

What is cache?
From a very general perspective in computing(not limited to ATG), cache is a place where some frequently accessed data is stored for "quick access".
You might have a RAM in your computer, where all the programs which are in execution are loaded. Well, RAM is a very big place and has a lot of programs loaded. There might be some program statements which are very frequently accessed. Now, our CPU has to do a lot of work to find and execute those statements in RAM (typically tough to find a few statements in a 4GB/8GB RAM). To overcome this, these frequently accessed statements are stored in "Cache", which is of very low storage capacity (and hence faster). Now, if this cache is of 10MB, it is easier to find some statements in this Cache rather than in an 8GB of RAM.
Therefore, CPU chooses to store frequently accessed statements in Cache for quick and easy access.

This concept is also used in storage and ATG is no different.
For any web-application (ATG or non-ATG), all the data is stored in database tables. A data base hit is ALWAYS an expensive operation. Imagine a website where thousands (or millions) of users access a site which fetches data from the database. It can have a very high performance impact.
Here's when Repository Cache comes into play.
There are various ways to configure cache for repositories, which we will learn soon.

What can be cached?
A simple answer to this question would be, we can cache, repository-items and even queries.

Item Caches
Item cache is the cache for Repository-Items. Repository-Items are indexed on the basis of their repository-ids and the cache is invalidated, whenever an item is updated. The invalidation also depends on the cache-mode you configure.

Query Caches
As the name suggests, it caches the repository-ids corresponding to the queries.
Also, the corresponding repository-items are cached in the Item-Cache separately.

If some query is executed on a repository and query-cache is enabled, the repository will check the cached repository-ids corresponding to this cached query. [from query-cache]
Next, the items corresponding to these repository-ids are fetched from the item-cache.
Items not present in the cache, are fetched from the database.

1. Query-caching is disabled by default.
2. Do NOT use query-caching if the repository incurs frequent updates OR repeated queries are not frequent. It might do more harm than good in this case in terms of performance.
3. Query-cache entry can be invalidated in following cases:-

  • If any property of a cached-item, which is specified in the query is modified.
  • Items of queried Repository-Item are added/removed from repository.
Below diagram show the concept of item and query-caches.


What are different cache-modes in ATG?
Cache mode is basically the type of caching you configure. Cache modes mainly deal with when any item in cache should be invalidated, so that user sees consistent data.
Each cache mode has its own pros and cons, so let us understand each of them.

1. disabled
No item is stored in cache. Data is directly fetched from database.
You should use this in cases when the items are very frequently updated and you simply cannot afford to show stale data on your website.
For example, in the case of InventoryRepository's "inventory" item-descriptor, you'll always want latest inventory to be shown on your site. Also, if you enable any cache, then because of frequent updates in inventory, there are very good chances that your data will become stale. Also, maintaining the cache for very-frequently changing items can have a high performance impact. Even more than fetching data directly from the database.




WHEN TO USE?: Use when data is required in real time. E.g. Inventory.
ADVANTAGE: You will always see consistent data.
DISADVANTAGE: Performance. Every-time you fetch data, there will be a hit on the database.

2. simple
In simple cache-mode, each server maintains its OWN COPY of cache. Cache invalidated on one server will not invalidate the cache on any other server.
You should use this mode only in case of items which are rarely updated. For example, most of the item-descriptors in ProductCatalog are configured as "simple". 
Also, ProductCatalog is a versioned repository, and data can ONLY be changed via BCC deployment (which automatically clears the cache on all servers for the repository-item being deployed), so you wont have to worry on this one.

We also have to define a parameter, which contains the time before an item's cache is refreshed. You can set the attribute "item-cache-timeout" in milliseconds. The item will be present in the cache for "item-cache-timeout" seconds, after which, the cached item will be refreshed (on next access). If this value is set to zero, the item will be cached forever in the cache, unless manually invalidated.




WHEN TO USE?: Use for data which is hardly modified. [Product description, name etc.]
ADVANTAGE: Simple and fast.
DISADVANTAGE: This may lead to user seeing stale data on some servers. However, if used correctly, this could be a good option.

3. distributed
Distributed cache mode is a bit advanced cache-mode, which is better than simple cache mode but comes at a disadvantage.
Distributed cache mode maintains the cache across all the servers of the application by the use of networking.
This cache-mode is also divided into 3 sub-categories:-

3.1 Distributed TCP
1. Whenever an item is changed across any server, an invalidation event is broadcast across all the servers (which use TCP cache).
2. The message carries some data, e.g. repository id, type etc. to other TCP enabled servers.
3. Other servers receive this data and invalidate this item.




WHEN TO USE?: Use for data which is less frequently modified, but very frequently read. Items which are frequently changed should not use this cache-mode.
ADVANTAGE: More consistent than simple cache mode.
DISADVANTAGE: Network overhead in sending messages. If a server is down (on which the invalidation message is sent), there is no means of knowing whether the server received the invalidation event or not.

3.2 Distributed JMS
1. When an item is changed across any server, a JMS message is fired to invalidate the cache across other JMS enabled servers.
2. A JMS message delivery status is also stored in OOTB databse, hence ensuring the message delivery.




WHEN TO USE?: Use for data which is very frequently modified, but a consistent view is always required.
ADVANTAGE: Ensures better consistent view of data than Simple/Distributed TCP modes.
Delivery of invalidation message is ensured.
DISADVANTAGE: Performance is much slower than Distributed TCP.

3.2 Distributed Hybrid
This cache mode is so far the best among Distributed cache modes.
The invalidation event is sent ONLY to servers on which a particular item is cached.

WHEN TO USE?: Real-time data access.
ADVANTAGE: Better performance, with reduced network traffic.
DISADVANTAGE: Slight network overhead.

4. locked-caching
Locked caching is used, when you want an item to be modified by only 1 server at a time. For example, an order-item [commerceItem] can be modified by both; a user facing the ATG site and a customer service representative on another server.
In this case, you'd want only one person to modify that commerceItem at an instance of time. Here's when locked-caching comes into play.




WHEN TO USE?: For items, which can be modified by multiple servers.
ADVANTAGE: Consistent data view.
DISADVANTAGE: This cache-mode is write-based rather than read-based [like simple, distributed etc], therefore, it cannot be compared with other cache-modes.


If you want some in-depth detail on caching, you can refer to the ATG-Docs HERE.

Now that we have covered the last topic of repositories, we will be moving forward to the much awaited commerce articles..!!


Back



Next Chapter!






14 comments:

  1. Hi,it is nice explanation and can u explain differances b/w modes.wt is use of inherited cache mode?
    I want to apply the cache in item-descrptor level bt that cache mode dont want to apply in property level.how to write.Explain?

    ReplyDelete
  2. Hi Lakshman,

    1. DIFFERENCE BETWEEN CACHE MODES: You can find the differences between cache-modes from the above article by the points "When to use", "Advantages" and "Disadvantages" i have written for every cache mode.
    Typically an interviewer would expect you to describe when to use which cache-mode as a difference.


    2. APPLY CACHE AT ITEM-DESCRIPTOR LEVEL BUT NOT PROPERTY-LEVEL: the default cache mode is always "simple" cache mode.
    The "cache-mode" attribute can be used in both;< item-descriptor > tag and a < property > tag.
    Yo can define your cache mode you want at item-descriptor level and then you can apply
    < property name="your property name" cache-mode="simple" />.
    This way, your property will be cached by "simple" cache mode, nomatter what your item-descriptor cache mode is.

    ReplyDelete
  3. can you post the installation process steps of entire application in video format. I am unable to follw the installation steps provided in commerce documentation.

    ReplyDelete
    Replies
    1. Hi Praneeth,

      Although the documentation around setting up ATG is pretty much straightforward.
      I have the tutorial on my list and will be posting that once i complete the commerce articles.

      Delete
    2. Ok thank you for providing the best material

      Delete
  4. Hi Monis ,

    U have any idea on code building and deploying by using ant tool in atg application.

    ReplyDelete
    Replies
    1. Hi Bharath,
      Ant is an automation tool used for many tasks.
      You can compile java files from a particular location, copy/add/remove files and move them to another location.
      You'll have to go through ant documentation to write a script for that.
      Please refer to ATG's runAssembler for code building. Once you get the hang of it, you can write an ant script to automate the task.

      I will be sharing an article on application assembly and deployment, you can also choose to wait for that.

      Delete
  5. Hello Monis,

    Can you please provide tutorial on using ATG REST Module? A brief documentation on how to configure and use OOTB Web service will be a great help

    ReplyDelete
    Replies
    1. Hi,

      surely that article is on the list.
      Currently, we are working on commerce articles.
      Once commerce and BCC articles are done, we'll go ahead with the REST modules.

      Delete
  6. Hi Monis,
    Grasping ATG from Oracle documentation is challenging even for experienced engineer and you made it possible. Thank you.

    Couple of questions:
    1. What is the best caching strategy for OrderRepository?
    We have enabled(simple mode) caching with timeout of 15 min and we're seeing tons of ConcurrentUpdateExceptions. My answer is we should disable caching for OrderHistory and if needed use profile locks around OrderUpdates so that updates from multiple apps can be coordinated
    2. Can you elaborate on cache-mode inherit at property level?

    Can you please share your experience

    ReplyDelete
    Replies
    1. Hi Sudhakar,

      Firstly thanks for the appreciation. We're glad that people look up here and made this blog a success..

      Your Questions:
      1. Cache on order-repository: It mostly depends on the item-descriptor, but mostly you should use locked caching.

      Why not "simple" cache?: Order is an item-descriptor which is very frequently updated and that too, by the user. If user logs in via multiple browsers, or the order is being modified via mobile APP/mobile site, or the order is being modified in CSC, ConcurrentUpdateException is bound to occur. This happens because the version in your repository and the version in your session mismatch.
      Moreover, you should use simple cache on item-descriptor with very low update frequency, such as ProductCatalog, which is mostly modified via BCC. Also, during a BCC deployment, the cache is automatically refreshed. Therefore, this gives very high performance with data consistency.

      Why locked Cache?: On using locked caching, on each modification, no other instance can modify the order. The order will be consistent and no ConcurrentCheckoutExceptions would occur.

      Why not disabled cache mode?: Firstly, it will not prevent concurrent modifications, as Order is session based.
      Secondly, it would have a very high performance impact, as each modification would incur a database hit. Order is something which is frequently modified.

      2. cache-mode:inherit at property level: The "inherit" mode comes into play when you extend the XML file of a repository.
      Suppose, in the base XML [this might be out of the box], the cache-mode of the property is "disabled".
      Now, in your custom XML, you want this property to acquire the cache-mode same as its parent item-descriptor. In this case, you specify it as "inherit".
      Therefore, at a later point of time, if the cache-mode of item-descriptor is changed, the cache-mode of the property also changes and is in-sync with parent-item-descriptor's cache mode.

      Hope this answer satisfies you.
      Please reach out to me if you need any further help. :)

      Delete
  7. Hi Monis,

    What caching-mode should be used on profile repository?

    ReplyDelete
    Replies
    1. Profile repository is basically used for displaying user-data, which is modified generally by the user.
      You can use "simple" cache mode as the data changed by the user will reflect on the same server. You can set shorter timeouts.
      Also, it is advisable by ATG to use "disabled" cache on "password" property.

      Delete

Subscribe

Get All The Latest Updates Delivered Straight Into Your Inbox For Free!

Flickr