MENU

Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Bug in Hana version 96



Case Study:

HANA DB Version : 96

Symptoms:--

 We are facing issues in SID DB , applications are not able to connect and load  increasing.
 System is currently down.
 
Then DBA Finding's---

Please find the additional findings for the database into hanging situation in system SID:

1.     Below graph shows that all transactions started blocking  from 1:15pmUTC, and into peak till 2:30Pm UTC.


 
We can see that  lot of update statements started running from 1:45 UTC with  shared lock and lock_wait_name is "ConsistentChangeLock".
From Trace file :
SQL: UPSERT "";18446744073709551615;"ConsistentChangeLock";18446744073709551615;"(no owner)";1449286276373426 [local];922377;959254365;0;65341974842;231824;140468094595072;1183744;65536
"WorkerThread (StatisticsServer)";23088474544;27927;139436760117248;"Inactive";-1;"";"";-1;-1;"WorkerThread (St";"";"ready";"";18446744073709551615;"WorkerThreadControl";18446744073709551615;"(no owner)";1442715024771187 [local];1098;247640654;0;6651882397814;302461;140372804837376;1183744;65536
"JobWrk0236";32958744069;53075;140404660824096;"Inactive";-1;"";"";-1;-1;"JobWorker";"";"";"";18446744073709551615;"jx-parking";18446744073709551615;"(no owner)";1444758947510585 [local];11358036;251069532;0;4569969975350;27733;140015916625920;1183744;65536
"SqlExecutor";44657262666;11313;140438123827200;"SharedLock Enter";458062;"";"";230;1109164195;"SqlExecutor";"";"";"

2.     At the same time, we can see that save point was also started running from 2:15UTC and it’s  critical phase start time is also more as like below:


 
Below Finding from Index Server Trace file  -----all stack of savepoint: 

43808816322[thr=62755]: SqlExecutor at
1: 0x00007fca4737e279 in syscall+0x15 (libc.so.6)
2: 0x00007fca587625d3 in Synchronization::ReadWriteLock::timedWaitLockSharedLL(Execution::Context&, unsigned long, unsigned long, bool)+0x410 at LinuxFutexOps.hpp:53 (libhdbbasis.so)
3: 0x00007fca5875d033 in Synchronization::ReadWriteLock::lockShared(Execution::Context&, unsigned long)+0x50 at ReadWriteLock.cpp:1203 (libhdbbasis.so)
4: 0x00007fca56cb536d in DataAccess::SavepointLock::lockShared(unsigned long, DataAccess::ReorgScopeLockMode)+0x39 at ReadWriteLock.hpp:1082 (libhdbdataaccess.so)
5: 0x00007fca56cbd5cf in DataAccess::SavepointSPI::lockSavepoint(unsigned long&, unsigned long, bool, DataAccess::ReorgScopeLockMode)+0x6b at SavepointImpl.cpp:1395 (libhdbdataaccess.so)
6: 0x00007fca56fc19fd in TransactionManager::RollbackLogRecord::logRollback(DataAccess::PersistenceSession&, long, DataAccess::RedoLogCallback*, void*, bool)+0xe9 at RollbackLogRecord.cpp:174 (libhdbdataaccess.so)
7: 0x00007fca4d97a6af in ptime::Transaction::write_trans_end_log_normal(ptime::Transaction::TransState, ptime::PostcommitHandler*, bool, bool)+0x44b at translog.cc:479 (libhdbrskernel.so)
8: 0x00007fca4d995231 in ptime::Transaction::preabort(ptime::PostcommitHandler*, session::SessionStateCallBack*, bool)+0x1b50 at transmgmt.cc:4671 (libhdbrskernel.so)
9: 0x00007fca4d999177 in ptime::Transaction::abort(bool)+0x203 at transmgmt.cc:4005 (libhdbrskernel.so)
10: 0x00007fca4d4a9c2a in ptime::Connection::rollback(Execution::Context&)+0x5d6 at Connection.cc:1451 (libhdbrskernel.so)
11: 0x00007fca4d4b11dd in ptime::Connection::handleException_(ptime::PtimeException const&, bool, bool, bool, bool, ptime::Statement*)+0x8a9 at Connection.cc:1754 (libhdbrskernel.so)
12: 0x00007fca4d501bd0 in ptime::Statement::handleException_(Execution::Context&, ptime::PtimeException const&, bool)+0x170 at Statement.cc:4182 (libhdbrskernel.so)

3.  
We can see so many long-running thread in status "ExclusiveLock Enter" and lock_wait_name is "BTree GuardContainer" Thread holding the consistent change lock waits until the reader holding the "BTree GuardContainer" semaphore releases it.


Call stack of update statements:

285285[thr=60608]: SqlExecutor at
1: 0x00007fca4737e279 in syscall+0x15 (libc.so.6)
2: 0x00007fca5876995f in Synchronization::BinarySemaphore::timedWait(unsigned long, Execution::Context&)+0x1bb at LinuxFutexOps.hpp:53 (libhdbbasis.so)
3: 0x00007fca62b5f025 in AttributeEngine::BtreeAttribute::_acquireRetry(int&, AttributeEngine::BtreeAttribute::GuardContainerLocks*&)+0x3c1 at BTreeHelper.cpp:64 (libhdbcs.so) 0x00007fca6402f45f in AttributeEngine::BTreeAttribute::commitOptimizeForStrings(AttributeEngine::CommitOptimizeEnv&)+0x63b at BTreeHelper.h:61 (libhdbcs.so)
5: 0x00007fca5a136ddd in AttributeEngine::AttributeIndexJob::run()+0x189 at AttributeApi.cpp:1078 (libhdbcs.so)
6: 0x00007fca5a102ef4 in AttributeEngine::AttributeApi::commitOptimizeAttributes(ltt::smartptr_handle, bool, bool, bool)+0x1250 at AttributeApi.cpp:1450 (libhdbcs.so)
7: 0x00007fca68bf5d56 in TRexAPI::TableUpdate::writeDataIntoAttributeEngine(TrexBase::IndexName const&, bool)+0x162 at TableUpdate.cpp:12278 (libhdbcsapi.so)
8: 0x00007fca68bf69a8 in TRexAPI::TableUpdate::doConsistentChange(TrexBase::IndexName const&, bool)+0x114 at TableUpdate.cpp:15628 (libhdbcsapi.so) 0x00007fca68bf4e15 in TRexAPI::TableUpdate::execute_update_using_keys(TRexCommonObjects::UdivMapping::Source, ltt_adp::vector >&)+0x1a71 at TableUpdate.cpp:2904 (libhdbcsapi.so)
10: 0x00007fca68c0d333 in TRexAPI::TableUpdate::execute_local()+0x15a0 at TableUpdate.cpp:2223 (libhdbcsapi.so)
11: 0x00007fca68c0db9e in TRexAPI::TableUpdate::dispatchCallToNet()+0xfa at TableUpdate.cpp:1884 (libhdbcsapi.so)
12: 0x00007fca68c03888 in TRexAPI::TableUpdate::execute_context()+0x734 at TableUpdate.cpp:1670 (libhdbcsapi.so)
13: 0x00007fca68c04471 in TRexAPI::TableUpdate::executeWithRetry()+0x10 at TableUpdate.cpp:1434 (libhdbcsapi.so)
14: 0x00007fca68c045d5 in TRexAPI::TableUpdate::execute(TRexAPI::QueryContext*, bool)+0x111 at TableUpdate.cpp:1414 (libhdbcsapi.so)



Solution :

1.     This is a HANA bug which is seen in 96 version.

2.     As per the SAP note  2214279 , to release the consistent change lock after a short timeout interval, before trying again, and to avoid this blocking situation , we have to upgrade to version 97.02.

3.      work around available for this issue without upgrading the database, we can change the below setting till we upgrade the system to the next revision.

- If you are running a Revision equal to or higher than 96, and lower than 97.02, set the following parameter in the indexserver.ini configuration file:

section: [delta]

parameter name: cch_reopening_enabled

parameter value: true

This parameter will take effect immediately, it does not require a restart.


No comments: