Friday, December 21, 2012

数据库logfilsiz设置过大导致数据库无法启动总结



情况描述:
昨天碰到一台服务器,CPU时而高时低,内存持续满(计算内存40%,非计算内存60%),硬盘HDISK1持续100%,服务器上所有操作都很卡。该机器是测试环境的数据,在出现故障前开发进行了大量数据导入后来因为数据库异常而终止(不清楚终止情况)。
和开发确认上面没有重要数据后直接使用db2stop force停止实例。等待了30多分钟未见数据库停止,因为考虑到之前导入大量数据而终止可能是数据库正在回滚。而这台机器是虚拟机所以可能性能比较差。也从开发人员那边得知之前重启过实例但问题依旧。虽然感觉有点奇怪但还是决定等待db2stop force完成。

问题分析:
隔天上班时db2stop force已经完成,因为在db2stop时会持续在db2diag.log中记录Return code from sqleIsStopPending的消息,所以直接从db2diag.log看出日志持续写到凌晨4点钟,经过计算得出db2stop force执行了10个小时
实例停止后服务器硬件指标都降下来了,处于空闲状态。因为不完全确认故障问题所以没敢马上启动实例,而是先检查db2diag.log。下面是采取片段:

1970-02-08-05.02.59.137718+480 E108916A595 LEVEL: Error
PID : 463088 TID : 40428 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : DEVDB3
APPHDL : 0-55279 APPID: 182.180.80.97.42269.700207205957
AUTHID : DB2INST2
EDUID : 40428 EDUNAME: db2agent (DEVDB3) 0
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860
MESSAGE : ADM1823E The active log is full and is held by application handle
"55279.."1. Terminate this application by COMMIT, ROLLBACK or FORCE
APPLICATION.

1970-02-08-05.02.59.138258+480 I109512A555 LEVEL: Error
PID : 463088 TID : 40428 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : DEVDB3
APPHDL : 0-55279 APPID: 182.180.80.97.42269.700207205957
AUTHID : DB2INST2
EDUID : 40428 EDUNAME: db2agent (DEVDB3) 0
FUNCTION: DB2 UDB, data protection services, sqlpWriteLR, probe:6680
MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE
"Log File has reached its saturation point"
DIA8309C Log file was full2.
1970-02-08-05.07.32.388912+480 I119224A453 LEVEL: Event
PID : 463088 TID : 25305 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
APPHDL : 0-55339 APPID: *LOCAL.db2inst2.700207210732
AUTHID : DB2INST2
EDUID : 25305 EDUNAME: db2agent (instance) 0
FUNCTION: DB2 UDB, config/install, sqlfLogUpdateCfgParam, probe:20
CHANGE : CFG DB DEVDB3: "Logfilsiz" From: "1024" To: "1048572"3

1970-02-08-05.07.52.489029+480 I123544A395 LEVEL: Event
PID : 1167512 TID : 1 PROC : db2stop2
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:104
DATA #1 : String, 26 bytes
Force phase is in progress
DATA #2 : Hexdump, 4 bytes
0x0FFFFFFFFFFFA024 : 0000 0000 ....

1970-02-08-05.07.52.489442+480 I123940A477 LEVEL: Event
PID : 463088 TID : 258 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, base sys utilities, sqeAppServices::ExecuteStopForce, probe:1000
DATA #1 : String, 47 bytes
[Force]->Number of applications to be forced :
DATA #2 : Hexdump, 4 bytes
0x07000000003FC354 : 0000 0041 ...A

1970-02-08-05.07.55.392580+480 I129172A343 LEVEL: Error
PID : 573660 TID : 1 PROC : db2fmp (idle) 0
INSTANCE: db2inst2 NODE : 000
EDUID : 1 EDUNAME: db2fmp (idle) 0
FUNCTION: DB2 UDB, common communication, sqlccipcdarihandshake, probe:3
RETCODE : ZRC=0x83000024=-2097151964

1970-02-08-05.07.55.393504+480 I129860A458 LEVEL: Error
PID : 1241196 TID : 1 PROC : db2fmp (idle) 0
INSTANCE: db2inst2 NODE : 000
EDUID : 1 EDUNAME: db2fmp (idle) 0
FUNCTION: DB2 UDB, routine_infrastructure, sqlerFmpListener, probe:999
MESSAGE : FMP exiting with error. Last agent registered:
DATA #1 : Hexdump, 4 bytes
0x0FFFFFFFFFFFEFB8 : 0000 0000

1970-02-08-05.07.55.416112+480 I137637A337 LEVEL: Error
PID : 1101954 TID : 1800 PROC : db2fmp (C) 0
INSTANCE: db2inst2 NODE : 000
EDUID : 1800 EDUNAME: db2fmp (C) 0
FUNCTION: DB2 UDB, common communication, sqlccipcdarihandshake, probe:3
RETCODE : ZRC=0x83000024=-2097151964

1970-02-08-05.07.55.851053+480 E145063A313 LEVEL: Event
PID : 1167512 TID : 1 PROC : db2stop2
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:911
MESSAGE : ADM7514W Database manager has stopped.4
STOP : DB2 DBM

1970-02-08-05.08.01.683376+480 E153925A406 LEVEL: Warning
PID : 847882 TID : 258 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloStartAIOCollectorEDUs, probe:30
MESSAGE : ADM0513W db2start succeeded5. However, no I/O completion port (IOCP)
is available.
1970-02-08-05.31.00.235498+480 E161290A328 LEVEL: Error (OS)
PID : 413762 TID : 1 PROC : db2star2
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, SQO Memory Management, sqloMemCreateSingleSegment, probe:100
CALLED : OS, -, shmget
OSERR : EEXIST (17) "File exists"

1970-02-08-05.36.36.688369+480 E161619A1231 LEVEL: Error (OS)
PID : 847882 TID : 5656 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : DEVDB3
APPHDL : 0-8 APPID: 182.180.58.170.4852.700207210804
AUTHID : DB2INST2
EDUID : 5656 EDUNAME: db2agent (DEVDB3) 0
FUNCTION: DB2 UDB, oper system services, sqlowrite, probe:70
MESSAGE : ZRC=0x850F000C=-2062614516=SQLO_DISK "Disk full."
DIA8312C Disk was full.
CALLED : OS, -, write
OSERR : ENOSPC (28) "No space left on device"
DATA #1 : File handle, PD_TYPE_SQO_FILE_HDL, 8 bytes
File Handle = 32
File System Block Size = 0 bytes
File System Type = UNKNOWN
File Handle Flags :
Require Sector Align = No
DIO/CIO Mode = No
Raw Block Device = No
Reserved Handle = No
Flush On Close = Yes
Thread-Level Lock = No
Write-through Mode = No
File Not Tracked = No
DATA #2 : unsigned integer, 8 bytes
262144
DATA #3 : signed integer, 8 bytes
114688
DATA #4 : String, 105 bytes
Search for ossError*Analysis probe point after this log entry for further
self-diagnosis of this problem.

1970-02-08-05.36.39.851436+480 I164363A522 LEVEL: Severe
PID : 847882 TID : 5656 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : DEVDB3
APPHDL : 0-8 APPID: 182.180.58.170.4852.700207210804
AUTHID : DB2INST2
EDUID : 5656 EDUNAME: db2agent (DEVDB3) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:1250
RETCODE : ZRC=0x85100084=-2062548860=SQLP_NO_SPACE_FOR_LOG
"Not enough space to create primary logs6"

1970-02-08-06.04.43.164196+480 I169962A1644 LEVEL: Event
PID : 1278142 TID : 1 PROC : db2stop
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, base sys utilities, sqleStartStopSingleNode, probe:1130
DATA #1 : String, 34 bytes
/home/db2inst2/sqllib/adm/db2stop2
DATA #2 : Hexdump, 256 bytes
0x0FFFFFFFFFFF4DB0 : 2F68 6F6D 652F 6462 3269 6E73 7432 2F73 /home/db2inst2/s
0x0FFFFFFFFFFF4DC0 : 716C 6C69 622F 6164 6D2F 6462 3273 746F qllib/adm/db2sto
0x0FFFFFFFFFFF4DD0 : 7032 004E 4F4D 5347 0046 4F52 4345 0053 p2.NOMSG.FORCE.S
0x0FFFFFFFFFFF4DE0 : 4E00 0000 0000 0000 0000 0000 0000 0000 N...............
0x0FFFFFFFFFFF4DF0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E00 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E10 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E20 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E30 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E40 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E50 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E60 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E70 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E80 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4E90 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0FFFFFFFFFFF4EA0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................

1970-02-08-06.08.40.541300+480 I172481A441 LEVEL: Error
PID : 847882 TID : 1543 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 1543 EDUNAME: db2tcpcm 0
FUNCTION: DB2 UDB, common communication, sqlcctcpconnmgr_child, probe:126
MESSAGE : Return code from sqleIsStopPending7 =
DATA #1 : Hexdump, 4 bytes
0x070000000EBFC434 : FFFF E86C

1970-02-08-06.15.08.922343+480 I303344A980 LEVEL: Error
PID : 434262 TID : 1 PROC : db2fm
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 Common, Generic Control Facility, GcfCaller::getState, probe:40
MESSAGE : ECF=0x9000028C=-1879047540=ECF_GCF_GCF_FUNCTION_TIMED_OUT
Timeout occured while calling a GCF interface function
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000006B5C84 pdOSSeLoggingCallback + 0x34
[1] 0x09000000020CCAC4 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x09000000020CC868 ossLog + 0x88
[3] 0x0900000004DCEBD4 getState__9GcfCallerFP12GCF_PartInfoUlP11GCF_RetInfo + 0xB4
[4] 0x0000000100003990 main + 0x2430
[5] 0x0000000100000290 __start + 0x98
[6] 0x0000000000000000 ?unknown + 0x0
[7] 0x0000000000000000 ?unknown + 0x0
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0

1970-02-08-06.15.09.216185+480 I304325A1178 LEVEL: Error
PID : 434262 TID : 1 PROC : db2fm
INSTANCE: db2inst2 NODE : 000
EDUID : 1
FUNCTION: DB2 Common, Generic Control Facility, GcfCaller::getState, probe:40
MESSAGE : ECF=0x90000292=-1879047534=ECF_GCF_SERVICE_FUNCTION_FAILED
The GCF function ran properly but the service did not behave accordingly
CALLED : DB2 Common, Generic Control Facility, gcf_reset
RETCODE : ECF=0x90000298=-1879047528=ECF_GCF_FAILURE
Failure as reported through GCF standard (GCF_FAILURE)
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000006B5C84 pdOSSeLoggingCallback + 0x34
[1] 0x09000000020CCAC4 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x09000000020CCE50 ossLogRC + 0x70
[3] 0x0900000004DCEC80 getState__9GcfCallerFP12GCF_PartInfoUlP11GCF_RetInfo + 0x160
[4] 0x0000000100003990 main + 0x2430
[5] 0x0000000100000290 __start + 0x98
[6] 0x0000000000000000 ?unknown + 0x0
[7] 0x0000000000000000 ?unknown + 0x0
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0



重要的信息都已经使用红色进行标注。从上面的信息我们可以分析出开发人员的操作流程:
  1. Appl handle 55279进行数据的导入
  2. 导入过程中出现了日志空间满的错误信息
  3. 开发人员修改logfilsiz大小,将4M 调整为 4G
  4. 调整后为了让参数生效重启了实例
  5. 实例重启后出现该故障

第一次处理:
db2diag.log中看到的直接错误信息就是空间满已经主日志空间不足。怀疑是硬盘空间问题,使用df –g检查文件系统容量。未发现空间满的情况。因为开发人员说过实例重启后还是出现相同问题。所以觉得决定先启动实例看看情况。操作如下:
  1. 切换到实例账号下
  2. cd sqllib/db2dump
  3. mv db2diag.log db2diag.log.2
  4. db2start

重命名db2diag.log的原因是为了方便查看实例的启动日志。db2start后很快实例就启动起来,观察topas发现大概过了2-3分钟后故障再次出现:


从截图可以看出hdisk1上的Busy%又跑到100%左右。而内存也被计算和非计算得占满。因为机器太慢,直接来了把db2_kill

$ db2_kill
ipclean: Removing DB2 engine and client's IPC resources for db2inst2.
ipclean: Error 22(Invalid argument) removing ipc 5242886
ipclean: Error 22(Invalid argument) removing ipc 41943051
ipclean: Error 22(Invalid argument) removing ipc 35651604
ipclean: Error 22(Invalid argument) removing ipc 114294808
$ db2_ps
Node 0
UID PID PPID C STIME TTY TIME CMD

杀掉进程后可以使用db2_ps确定进程是否已经杀死。
同时检查了db2diag.log的启动信息,发现有下面错误点:

1970-02-08-21.49.28.547863+480 E7139A1112 LEVEL: Error (OS)
PID : 1265796 TID : 258 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloAIXLoadModuleTryShr, probe:130
CALLED : OS, -, dlopen
OSERR : ENOEXEC (8) "Exec format error"
MESSAGE : Attempt to load specified library failed.
DATA #1 : Library name or path, 40 bytes
/home/db2inst2/sqllib/lib64/libdb2iocp.a
DATA #2 : shared library load flags, PD_TYPE_LOAD_FLAGS, 4 bytes
2
DATA #3 : String, 513 bytes
Symbol resolution failed for /home/db2inst2/sqllib/lib64/libdb2iocp.a because:
Symbol CreateIoCompletionPort (number 0) is not exported from dependent
module /unix.
Symbol GetQueuedCompletionStatus (number 1) is not exported from dependent
module /unix.
Symbol GetMultipleCompletionStatus (number 2) is not exported from dependent
module /unix.
Could not load module /home/db2inst2/sqllib/lib64/libdb2iocp.a.
System error: Exec format error
Examine .loader section symbols with the 'dump -Tv' command.

1970-02-08-21.49.28.550520+480 E8252A861 LEVEL: Error (OS)
PID : 1265796 TID : 258 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloAIXLoadModuleTryShr, probe:140
CALLED : OS, -, dlopen
OSERR : ENOENT (2) "No such file or directory"
MESSAGE : Attempt to load specified library augmented with object name failed.
DATA #1 : Library name or path, 50 bytes
/home/db2inst2/sqllib/lib64/libdb2iocp.a(shr_64.o)
DATA #2 : shared library load flags, PD_TYPE_LOAD_FLAGS, 4 bytes
262146
DATA #3 : String, 213 bytes
Could not load module /home/db2inst2/sqllib/lib64/libdb2iocp.a(shr_64.o).
File /home/db2inst2/sqllib/lib64/libdb2iocp.a is not an
archive or the file could not be read properly.
System error: Exec format error

1970-02-08-21.49.28.551105+480 I9114A568 LEVEL: Severe
PID : 1265796 TID : 258 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloLioInitIocp, probe:200
CALLED : DB2 UDB, oper system services, sqloLoadModule
RETCODE : ZRC=0x870F009B=-2029059941=SQLO_MOD_LOAD_FAILED
"Dynamic library load failed."
DATA #1 : Library name or path, 12 bytes
libdb2iocp.a
DATA #2 : Library Search Path, 27 bytes
/home/db2inst2/sqllib/lib64

虽然看不出libdb2iocp.a文件时干嘛的,但是可以从sqloAIXLoadModuleTryShr看出这应该是何系统有关资源或者接口调用问题。经过检查发现libdb2iocp.a文件时存在的,而且权限也没有问题。没法再分析下去,所以google了以下在ibm网站上找到相关资料:

The DB2DIAG.LOG reports an ENOEXEC (8) error on a DLOPEN call
Problem(Abstract)
The db2diag.log reports several memory related error messages. This is commonly seen after a recent install/upgrade.

Cause

AIX I/O Completion Ports (IOCP) are not enabled.

这个问题和我的一模一样,环境也合适,所以就跟着整了一把。

$ oslevel -s
5300-11-00-0000
$ lslpp -l bos.iocp.rte
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
bos.iocp.rte 5.3.11.0 COMMITTED I/O Completion Ports API

Path: /etc/objrepos
bos.iocp.rte 5.3.11.0 COMMITTED I/O Completion Ports API
$ lslpp -l bos.iocp.rte
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
bos.iocp.rte 5.3.11.0 COMMITTED I/O Completion Ports API

Path: /etc/objrepos
bos.iocp.rte 5.3.11.0 COMMITTED I/O Completion Ports API
$ lsdev -Cc iocp
iocp0 Defined8 I/O Completion Ports

很明显这就是IBM网站问题列表说的IOCP端口处于not available状态。正常状态应该是Available。根据上面的指导:
  1. Log in to the server as root and issue the following command: # smitty iocp
  2. Select "Change / Show Characteristics" of I/O Completion Ports. 
  3. Change the configured state at system restart from Defined to Available. 
  4. Reboot. 
  5. Enter the lsdev command again to confirm that the status of the IOCP port has changed to Available. 

设置并重启后IOCP的状态正常了。重启实例并检查db2diag.log未发现之前的错误信息。观察几分钟没有再出现IO、内存等很高的现象。


这里DB2还提供了一个参数:DB2_USE_IOCP,如果IOCP无效可以将该参数设置成OFF,但是一般都建议这样做。下面是信息中的说明:

DB2_USE_IOCP
  • Operating system: AIX 5.3 TL9 SP2, AIX 6.1 TL2, or AIX 7.1 or later.
  • Default: ON, Values: OFF or ON
  • Configure IOCP before enabling this registry variable.
This variable enables the use of AIX I/O completion ports (IOCP) when submitting and collecting asynchronous I/O (AIO) requests. This feature is used to enhance performance in a non-uniform memory access (NUMA) environment by avoiding remote memory access.


第二次处理:
实例正常后连接数据库,这是db2 connect to db_name执行后再次卡在那边,整个服务器又很卡,查看topas发现和之前类似的问题。hdisk0Busy%再次到100%,非计算内存也很高。第一反应是数据库可能在进行回滚之类的操作。但等待了有20分钟左右后还是没有连接上。同时查看db2diag.log也没发现错误:

1970-02-08-22.52.20.493437+480 I15561A511 LEVEL: Info
PID : 938198 TID : 6170 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : HEXECM
APPHDL : 0-9 APPID: *LOCAL.DB2.700208144919
AUTHID : DB2INST2
EDUID : 6170 EDUNAME: db2stmm (HEXECM) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterBufferPoolAct, probe:90
MESSAGE : Altering bufferpool "IBMDEFAULTBP" From: "1003" <automatic> To:
"1504" <automatic>

1970-02-08-22.56.24.675389+480 I16073A501 LEVEL: Event
PID : 938198 TID : 6170 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : HEXECM
APPHDL : 0-9 APPID: *LOCAL.DB2.700208144919
AUTHID : DB2INST2
EDUID : 6170 EDUNAME: db2stmm (HEXECM) 0
FUNCTION: DB2 UDB, config/install, sqlfLogUpdateCfgParam, probe:20
CHANGE : STMM CFG DB HEXECM: "Database_memory" From: "41568" <automatic> To: "48500" <automatic>

1970-02-08-23.02.42.350262+480 I16575A511 LEVEL: Info
PID : 938198 TID : 6170 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : HEXECM
APPHDL : 0-9 APPID: *LOCAL.DB2.700208144919
AUTHID : DB2INST2
EDUID : 6170 EDUNAME: db2stmm (HEXECM) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterBufferPoolAct, probe:90
MESSAGE : Altering bufferpool "IBMDEFAULTBP" From: "1504" <automatic> To:
"2256" <automatic>

db2diag.log都是正常的提示信息,这就犯难了。后来没办法了,请老大出面处理。老大直接启动实例,连接数据库。没有过多检查就质问是否因为primary logs太大分配不出来。这话一出让我想到了在第一次排查时分析的db2diag.log中有logfilsiz调整的信息。找到了调整数值计算了下调整后一个日志文件是4G,如果按照logprimary10个(一般配置都比这个高)计算那么主日志需要40G的空间,而主日志空间是在数据库启动的时候进行分配的,也就是说在进行数据库连接的时候需要向硬盘申请40G的空间。天啊,这也太大了,本来就是一台虚拟机还一次性要申请几十G的空间难怪数据库连接不上了。

总结:

这次故障处理在第二次处理时犯了几个大错误
  1. 这次的故障在问题分析中的db2diag.log就可以发现在第6个脚注的地方进行实例重启后第7个脚注出现Not enough space to create primary logs,并且在第3个脚注的地方已经知道开发人员进行了logfilsiz调整,从这些信息中实际上可以分析出是因为数据库日志空间分配的问题导致数据库连接故障。
  2. 发现参数错误后实际上可以使用update db cfg for db_name using logfilsiz进行修改,但是因为不清楚有for这个语法所以直接将数据库删除。
  3. 和开发沟通不够,开发在态度上也不是太友好。他们自认为不是他们的问题。

存在疑点:

目前还不清楚是什么原因导致iocpAvailable变成Defined


1  appl应该是导入数据的。因为日志满而终止。
2  日志满错误信息。
3  日志大小从1024*4k修改成1048573*4k(4G),这是非常重要的信息,一个日志4G那么如果primary logs如果是10的话那么需要40G的主日志空间。
4  数据库实例停止,我进行操作的时间是06点左右,所以在这之前实例进行了stop
5  实例重启结束
6  非常重要的信息,主日志空间不足。这个信息可以结合第3点脚注进行分析
7  实例的停止消息,因为执行了db2stop force所以会持续出现该消息。
8  正常应该是Available

No comments:

Post a Comment