Who’s who in the database world: C. Mohan (The ARIES Algorithm)

Hi guys,

Welcome to an atypical post!

In this post I want to tell you a little bit about the history of the vast world of databases.

I will take this opportunity to talk about database theory.
I know! ... it’s been a while!

Today we talk about Dr. C. Mohan. But who is Dr. Mohan?

Chandrasekaran Mohan is an Indian-born American computer scientist.

Does this name mean anything to you?

What if I told you that it was the person who conceived the ARIES family of algorithms?

But let’s start from the beginning..

Enjoy!

C.Mohan and the ARIES Algorithm

Chandrasekaran Mohan is an Indian-born American computer born in Tamil Nadu (India) in 1955.

After growing up there and finishing his undergraduate studies in Chennai, he moved to the United States in 1977 for graduate studies.

He received his PhD in computer science from the University of Texas at Austin in 1981.

After finishing his PhD in the database area in December 1981, Mohan joined IBM Research, the research and development division for IBM, in San Jose working on projects like R*, Starburst, Exotica, and DBCache.

In 1992, at IBM Research, he is the primary inventor of the ARIES family of algorithms.

ARIES acronim of "Algorithms for Recovery and Isolation Exploiting Semantics" is a recovery algorithm used by IBM Db2, Microsoft SQL server and many other database system.

His famous article entitled ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging is a piece of database history.

Most recently he worked on other projects related in particular to storage class memories, Big Data, Hybrid Transactional/Analytical Processing (HTAP) improvements to IBM Db2 and Apache Spark, and Blockchain and Distributed Ledger technologies.

But let’s talk about ARIES...

ARIES: Algorithms for Recovery and Isolation Exploiting Semantics

One of the most important aspect and problem in transaction processing is to guarantee durability (the letter "D" of the ACID acronim... i wrote here: SQL Server: Transactions, Lock e Deadlock. A little bit of theory explained in a simple way! ).

Summarising, the effects of transactions must survive to the failure of the system.

A (near-ubiquitous) way to guarantee is to perform logging.

Transaction executions are stored in a log on a non volatile media like hard drives or SSsD and these media should be fault tolerant.

Well, nowaday, the canonical algorithm for implementing an “No Force, Steal” WAL-based recovery manager is the ARIES algorithm conceived and developed by Mohan in IBM laboratories.

This algorithm is "No Force" because the database need not write dirty pages to disk at commit time. It is also called "Steal" becuase database can flush dirty pages to disk at any time.

These two property are present in almost every commercial RDBMS. They allow high performance even ny adding complexity to the database

As we just said, one of the key elements is the log.

The log is used to ensure that committed actions are reflected in the database, and that uncommitned actions are undone.

We can see the log as a single sequential file (ever-growing because we can append-only) where eeach log record has an unique log sequence number (LSN).

The LSNs is a progressive that always grows.

For performance reasons, records to be recorded are stored in a log buffer in memory first.

Only at certain times these record are saved into the Log on disk.

This may happens for example when the log buffes are full or when a transaction committed.

The log can contain different types of information:

An undo only record contains the information needed to reverse a change made by a transaction.

A redo only record contains the information to redo a change made by a transaction

If a log record contain both information, it is called an undo-redo log record..

All this information stored in the log is necessary for recovery from failure. Failures that can be of the following types:

Failure of a transaction (such that its updates need to be undone).
Failure of the database management system itself – in this scenario we assume that volatile storage contents are lost and recovery must be performed using the nonvolatile versions of the database and log.
Failure of media/device – in this scenario the contents of just that media are lost, and the lost data must be recovered using an image copy (archive dump) version of the lost data plus the log. Recovery independence is the notion that it should be possible to perform media recovery or restart recovery of objects at different granularities rather than only at the entire database level.

Now, with the concept just seen we can mention the three phasesof recovery process:

Analysis:
The recovery subsystem determines the earliest log record from which the next pass must start. It also scans the log forward from the checkpoint record to construct a snapshot of what the system looked like at the instant of the crash.
Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
Undo:
The log is scanned backward and updates corresponding to loser transactions are undone.

I do not go into details but I point out that all the work done makes C.Mohan we find it inside our SQL server!

You can also find more details in this post: https://sqlserverperformace.blogspot.com/2019/11/sql-server-2019-and-accelerated.html

That's all for today!

I hope you enjoyed this post,

Luke

Previous post: How to write performance queries! Take advantage of the Temporary Table Caching

Comments

AnonymousSeptember 27, 2022 at 12:32 AM
AlphaBold is Top leading, Microsoft award winning compnay of United States that is providing Dynamics 365 Partner
and worked with Microsoft Dynamics Sales & more. Choose the right Microsoft Dynamics 365 Consulting Providers in San Diego.
srinavMay 16, 2023 at 10:02 PM
This site was quite interesting thanks for sharing this blog CRM Software in chennai

Search This Blog

SQL Server Performance Blog where milliseconds matter