诊断集群性能
在进行数据库集群性能诊断时,可以采用多种方法来监控和分析系统状态。可以使用白屏监控工具如 OCP_Express 或 Prometheus 进行可视化监控,也可以通过直接运行 SQL 命令(黑屏操作)来获取实时性能数据。本文档将详细介绍如何通过 SQL 命令获取和分析 OceanBase 集群的性能指标,涵盖 CPU、内存、IO、磁盘等方面的内容,并提供相应的优化建议。
一、性能诊断概述
诊断集群性能的主要指标包括 CPU、内存、IO 和磁盘使用情况。通过分析这些指标的使用情况,可以了解系统瓶颈所在,并采取相应措施进行优化:
- CPU 占用过高:排查数据库正在执行的 SQL,通过限流或优化 SQL 来降低 CPU 使用率。
- 内存占用过高:
- 扩大租户内存分配。
- 调整转储(flush)线程数,加快数据从内存到磁盘的转储,以释放内存。
- 在无法扩容时,可调整 memstore 比例及转储阈值,临时扩大写入内存并降低转储阈值。
- 开启写入限速,限制客户端导入速度。
- 应用层面进行批处理任务降级。
- IO 占用过高:
- 暂停进行中的合并操作。
- 暂停执行中的备份任务、数据传输或导入/导出任务。
- 调整转储线程数、迁移复制并发数、后台任务带宽等参数。
- 对高负载 SQL 进行限流或优化。
- 取消进行中的索引创建等。
- 磁盘占用问题:
- 如果业务数据占用磁盘较高,可考虑扩容磁盘空间或清理不必要的数据。
- 日志数据占用较高时,可通过过期日志清理等手段降低占用。
二、连接集群并获取资源使用情况
1. 连接到集群
使用 root 用户登录 sys 租户:
obclient -h127.0.0.1 -uroot@sys -P2881 -Doceanbase -A
说明:此处 root 用户未设置密码,仅供体验使用。在实际环境中,请配置相关用户密码。
sys
租户用于系统级操作,实际运维中可根据需要切换至其他租户。
2. 查看 OBServer 资源分配和使用情况
获取 OBServer 的 CPU、内存、磁盘使用情况
通过如下 SQL 查询当前集群中 observer 主机的资源使用情况:
obclient [oceanbase]> SELECT
svr_ip AS ip,
CONCAT(cpu_capacity_max, 'C') AS cpu_total,
CONCAT(cpu_assigned_max, 'C') AS cpu_assigned,
CONCAT(TRUNCATE(mem_capacity / 1024 / 1024 / 1024, 2), 'G') AS mem_total,
CONCAT(TRUNCATE(mem_assigned / 1024 / 1024 / 1024, 2), 'G') AS mem_assigned,
CONCAT(TRUNCATE(data_disk_capacity / 1024 / 1024 / 1024, 2),'G') AS total_data_size,
CONCAT(TRUNCATE(log_disk_capacity / 1024 / 1024 / 1024, 2), 'G') AS total_log_size,
CONCAT(TRUNCATE((cpu_assigned_max / cpu_capacity_max) * 100, 2),'%') AS cpu_assigned_percent,
CONCAT(TRUNCATE((mem_assigned / mem_capacity) * 100, 2),'%') AS mem_assigned_percent,
CONCAT(TRUNCATE((data_disk_in_use / data_disk_capacity) * 100, 2),'%') AS used_data_percent,
CONCAT(TRUNCATE((log_disk_in_use / log_disk_capacity) * 100, 2),'%') AS used_log_percent
FROM
oceanbase.GV$OB_SERVERS;
查询结果示例:
+-----------+-----------+--------------+-----------+--------------+-----------------+----------------+----------------------+----------------------+-------------------+------------------+
| ip | cpu_total | cpu_assigned | mem_total | mem_assigned | total_data_size | total_log_size | cpu_assigned_percent | mem_assigned_percent | used_data_percent | used_log_percent |
+-----------+-----------+--------------+-----------+--------------+-----------------+----------------+----------------------+----------------------+-------------------+------------------+
| 127.0.0.1 | 16C | 2C | 5.00G | 3.00G | 8.00G | 13.00G | 12.50% | 60.00% | 2.70% | 26.92% |
+-----------+-----------+--------------+-----------+--------------+-----------------+----------------+----------------------+----------------------+-------------------+------------------+
说明:上述结果显示当前主机 CPU 分配率为 12.50%,内存分配率为 60.00%,数据盘使用率为 2.70%,日志盘使用率为 26.92%。
查询集群业务数据磁盘使用情况
obclient [oceanbase]> SELECT
CONCAT(TRUNCATE(data_disk_capacity/1024/1024/1024,2),'G') AS total_data_size,
CONCAT(TRUNCATE(data_disk_in_use/1024/1024/1024,2),'G') AS used_data_size,
CONCAT(TRUNCATE((data_disk_in_use/data_disk_capacity) * 100, 2),'%') AS used_data_percent,
CONCAT(TRUNCATE((data_disk_capacity - data_disk_in_use)/1024/1024/1024,2),'G') AS free_data_size,
CONCAT(TRUNCATE(((data_disk_capacity - data_disk_in_use)/data_disk_capacity)*100,2),'%') AS free_data_percent
FROM
oceanbase.GV$OB_SERVERS;
查询结果示例:
+-----------------+----------------+-------------------+----------------+-------------------+
| total_data_size | used_data_size | used_data_percent | free_data_size | free_data_percent |
+-----------------+----------------+-------------------+----------------+-------------------+
| 8.00G | 0.21G | 2.70% | 7.78G | 97.29% |
+-----------------+----------------+-------------------+----------------+-------------------+
说明:当前集群总的数据盘容量为 8.00G,已使用 0.21G,占比 2.70%,剩余 7.78G,占比 97.29%。
查询集群日志磁盘使用情况
obclient [oceanbase]> SELECT
CONCAT(TRUNCATE(log_disk_capacity/1024/1024/1024,2),'G') AS total_log_size,
CONCAT(TRUNCATE(log_disk_in_use/1024/1024/1024,2),'G') AS used_log_size,
CONCAT(TRUNCATE(log_disk_assigned/1024/1024/1024,2),'G') AS assigned_log_size,
CONCAT(TRUNCATE((log_disk_in_use/log_disk_capacity) * 100, 2),'%') AS used_log_percent,
CONCAT(TRUNCATE((log_disk_capacity - log_disk_assigned - log_disk_in_use)/1024/1024/1024,2),'G') AS free_log_size,
CONCAT(TRUNCATE(((log_disk_capacity - log_disk_assigned - log_disk_in_use)/log_disk_capacity)*100,2),'%') AS free_log_percent
FROM
oceanbase.GV$OB_SERVERS;
查询结果示例:
+----------------+---------------+-------------------+------------------+---------------+------------------+
| total_log_size | used_log_size | assigned_log_size | used_log_percent | free_log_size | free_log_percent |
+----------------+---------------+-------------------+------------------+---------------+------------------+
| 13.00G | 3.50G | 8.00G | 26.92% | 1.50G | 11.53% |
+----------------+---------------+-------------------+------------------+---------------+------------------+
说明:总日志盘容量为 13.00G,已使用 3.50G,已分配 8.00G,剩余 1.50G,占比 11.53%。
三、定位高 CPU 使用和资源分配情况
1. 查询 CPU 使用较高的 OBServer 和租户信息
当发现 CPU 使用较高时,可以使用如下 SQL 定位使用超过特定 CPU 数量的主机和租户:
obclient [oceanbase]> SELECT * FROM gv$sysstat
WHERE con_id > 1000
AND name = 'cpu usage'
AND value > 1000;
查询结果示例:
+--------+----------------+----------+------------+-----------+-------+-------+------------+---------+
| CON_ID | SVR_IP | SVR_PORT | STATISTIC# | NAME | CLASS | VALUE | VALUE_TYPE | STAT_ID |
+--------+----------------+----------+------------+-----------+-------+-------+------------+---------+
| 1101 | 10.xxx.xxx.xxx | 2882 | 370 | cpu usage | 64 | 1020 | SET_VALUE | 140006 |
+--------+----------------+----------+------------+-----------+-------+-------+------------+---------+
说明:从结果中可以看出,租户 ID 为 1101 的租户 CPU 使用较高。注意:在
cpu usage
中,每 100 单位表示 1 个 CPU 核心。示例中 VALUE 为 1020,相当于使用了 10.2 个 CPU 核心。当 CPU 负载很高时,可能会超过分配的核数,需要进一步优化 SQL 或调整资源分配。
2. 获取各租户业务数据磁盘占用情况
当业务数据盘占用较高时,可使用如下 SQL 查询各租户的业务数据磁盘使用情况:
obclient [oceanbase]> SELECT
TENANT_ID,
CONCAT(TRUNCATE(LOG_DISK_SIZE/1024/1024/1024,2),'G') AS total_data_size,
CONCAT(TRUNCATE(LOG_DISK_IN_USE/1024/1024/1024,2),'G') AS used_data_size,
CONCAT(TRUNCATE((LOG_DISK_IN_USE/LOG_DISK_SIZE) * 100, 2),'%') AS used_data_percent,
CONCAT(TRUNCATE((LOG_DISK_SIZE - LOG_DISK_IN_USE)/1024/1024/1024,2),'G') AS free_data_size,
CONCAT(TRUNCATE(((LOG_DISK_SIZE - LOG_DISK_IN_USE)/LOG_DISK_SIZE)*100,2),'%') AS free_data_percent
FROM
oceanbase.GV$OB_UNITS;
查询结果示例:
+-----------+-----------------+----------------+-------------------+----------------+-------------------+
| TENANT_ID | total_data_size | used_data_size | used_data_percent | free_data_size | free_data_percent |
+-----------+-----------------+----------------+-------------------+----------------+-------------------+
| 1 | 2.00G | 1.54G | 77.18% | 0.45G | 22.81% |
| 1001 | 0.59G | 0.45G | 75.88% | 0.14G | 24.11% |
| 1002 | 5.40G | 1.36G | 25.24% | 4.03G | 74.75% |
+-----------+-----------------+----------------+-------------------+----------------+-------------------+
说明:上述结果列出了各租户的业务数据盘分配、已使用和剩余情况。
3. 查询各租户的资源使用情况
可以使用如下 SQL 查询具体的 CPU、内存、数据盘和日志盘的使用情况:
obclient [oceanbase]> SELECT
COALESCE(t1.TENANT_ID, -1) AS TENANT_ID,
TENANT_NAME,
SUM(MAX_CPU) AS MAX_CPU,
SUM(MIN_CPU) AS MIN_CPU,
SUM(MAX_MEMORY) AS MAX_MEMORY,
SUM(MIN_MEMORY) AS MIN_MEMORY,
SUM(MAX_IOPS) AS MAX_IOPS,
SUM(MIN_IOPS) AS MIN_IOPS,
SUM(DATA_DISK_IN_USE) AS DATA_DISK_IN_USE,
SUM(LOG_DISK_IN_USE) AS LOG_DISK_IN_USE
FROM
(
SELECT
T1.UNIT_ID,
T1.SVR_IP,
T1.SVR_PORT,
T1.ZONE,
T2.TENANT_ID,
T1.MIN_CPU,
T1.MAX_CPU,
T1.MIN_MEMORY,
T1.MAX_MEMORY,
T1.MIN_IOPS,
T1.MAX_IOPS,
T1.IOPS_WEIGHT,
T1.DATA_DISK_IN_USE,
T1.LOG_DISK_IN_USE
FROM
(
SELECT
UNIT_ID,
SVR_IP,
SVR_PORT,
ZONE,
SUM(MIN_CPU) AS MIN_CPU,
SUM(MAX_CPU) AS MAX_CPU,
SUM(MEMORY_SIZE) AS MIN_MEMORY,
SUM(MEMORY_SIZE) AS MAX_MEMORY,
SUM(MIN_IOPS) AS MIN_IOPS,
SUM(MAX_IOPS) AS MAX_IOPS,
SUM(IOPS_WEIGHT) AS IOPS_WEIGHT,
SUM(DATA_DISK_IN_USE) AS DATA_DISK_IN_USE,
SUM(LOG_DISK_IN_USE) AS LOG_DISK_IN_USE
FROM
oceanbase.GV$OB_UNITS
GROUP BY
UNIT_ID
) T1
JOIN oceanbase.DBA_OB_UNITS T2 ON T1.UNIT_ID = T2.UNIT_ID
) t1
JOIN oceanbase.DBA_OB_TENANTS t2 ON t1.TENANT_ID = t2.TENANT_ID
WHERE
TENANT_TYPE IN ('SYS', 'USER')
GROUP BY
TENANT_ID;
查询结果示例:
+-----------+-------------+---------+---------+------------+------------+---------------------+---------------------+------------------+-----------------+
| TENANT_ID | TENANT_NAME | MAX_CPU | MIN_CPU | MAX_MEMORY | MIN_MEMORY | MAX_IOPS | MIN_IOPS | DATA_DISK_IN_USE | LOG_DISK_IN_USE |
+-----------+-------------+---------+---------+------------+------------+---------------------+---------------------+------------------+-----------------+
| 1 | sys | 1 | 1 | 1073741824 | 1073741824 | 9223372036854775807 | 9223372036854775807 | 109051904 | 1657930234 |
| 1002 | mq_t1 | 1 | 1 | 2147483648 | 2147483648 | 10000 | 10000 | 81788928 | 1953413448 |
+-----------+-------------+---------+---------+------------+------------+---------------------+---------------------+------------------+-----------------+
说明:从结果中可以了解各租户的最大/最小 CPU、内存、IOPS 以及数据盘、日志盘的使用情况。
四、查看 OBServer 运行情况与 Unit 信息
1. 查看 OBServer 运行情况
obclient [oceanbase]> SELECT
zone,
svr_ip,
svr_port,
sql_port AS inner_port,
with_rootserver,
UPPER(`status`) AS `status`,
build_version,
stop_time * 1000000 AS stop_time,
start_service_time * 1000000 AS start_service_time,
last_offline_time * 1000000 AS last_offline_time,
block_migrate_in_time * 1000000 AS block_migrate_in_time
FROM
oceanbase.DBA_OB_SERVERS;
查询结果示例:
+-------+-----------+----------+------------+-----------------+--------+-------------------------------------------------------------------------------------------+-----------+-----------------------------+-------------------+-----------------------+
| zone | svr_ip | svr_port | inner_port | with_rootserver | status | build_version | stop_time | start_service_time | last_offline_time | block_migrate_in_time |
+-------+-----------+----------+------------+-----------------+--------+-------------------------------------------------------------------------------------------+-----------+-----------------------------+-------------------+-----------------------+
| zone1 | 127.0.0.1 | 2882 | 2881 | YES | ACTIVE | 4.2.1.0_100000102023092807-... | NULL | 20231024113326843982.000000 | NULL | NULL |
+-------+-----------+----------+------------+-----------------+--------+-------------------------------------------------------------------------------------------+-----------+-----------------------------+-------------------+-----------------------+
说明:该 OBServer 位于
zone1
,IP 为127.0.0.1
,端口等信息显示正常,状态为 ACTIVE,版本为 V4.2.1。
2. 查询 Unit 列表和资源池信息
obclient [oceanbase]> SELECT
/*+ QUERY_TIMEOUT(60000000) */
t1.MODIFY_TIME AS UPDATE_TIME,
t1.RESOURCE_POOL_ID,
t1.NAME,
t1.UNIT_COUNT,
t1.UNIT_CONFIG_ID,
t1.ZONE_LIST,
t1.TENANT_ID,
t1.REPLICA_TYPE,
t2.NAME AS UNIT_CONFIG_NAME,
t2.MAX_CPU,
t2.MIN_CPU,
CONCAT(TRUNCATE(t2.MEMORY_SIZE/1024/1024/1024,2),'G'),
t2.MAX_IOPS,
t2.MIN_IOPS,
CONCAT(TRUNCATE(t2.LOG_DISK_SIZE/1024/1024/1024,2),'G')
FROM
oceanbase.DBA_OB_RESOURCE_POOLS AS t1
JOIN oceanbase.DBA_OB_UNIT_CONFIGS AS t2 ON t1.UNIT_CONFIG_ID = t2.UNIT_CONFIG_ID;
查询结果示例:
+----------------------------+------------------+------------+------------+----------------+-----------+-----------+--------------+------------------+---------+---------+-----------------------+---------------------+---------------------+-------------------------+
| UPDATE_TIME | RESOURCE_POOL_ID | NAME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | TENANT_ID | REPLICA_TYPE | UNIT_CONFIG_NAME | MAX_CPU | MIN_CPU | CONCAT(...) | MAX_IOPS | MIN_IOPS | CONCAT(...) |
+----------------------------+------------------+------------+------------+----------------+-----------+-----------+--------------+------------------+---------+---------+-----------------------+---------------------+---------------------+-------------------------+
| 2023-08-29 11:51:52.314928 | 1 | sys_pool | 1 | 1 | zone1 | 1 | FULL | sys_unit_config | 1 | 1 | 1.00G | 9223372036854775807 | 9223372036854775807 | 2.00G |
| 2023-08-29 15:19:10.329077 | 1001 | mq_pool_01 | 1 | 1001 | zone1 | 1002 | FULL | s1_unit_config | 1 | 1 | 2.00G | 10000 | 10000 | 6.00G |
+----------------------------+------------------+------------+------------+----------------+-----------+-----------+--------------+------------------+---------+---------+-----------------------+---------------------+---------------------+-------------------------+
说明:结果展示了各资源池的更新时间、资源分配情况等信息,如 'mq_pool_01' 资源池关联的租户为 ID 1002、配置为 2.00G 内存、10000 IOPS、6.00G 日志盘大小等。
3. 统计未使用的 unit
obclient [oceanbase]> SELECT
/*+ QUERY_TIMEOUT(5000000) */
t1.UNIT_ID,
t1.RESOURCE_POOL_ID,
t1.ZONE,
t1.SVR_IP,
t1.SVR_PORT,
t1.STATUS,
t2.REPLICA_TYPE,
NULL AS TENANT_ID,
NULL AS TENANT_NAME,
t2.NAME AS RESOURCE_POOL_NAME,
t2.MODIFY_TIME AS RESOURCE_POOL_UPDATE_TIME,
t1.MIGRATE_FROM_SVR_IP,
t1.MIGRATE_FROM_SVR_PORT,
t1.MANUAL_MIGRATE
FROM
oceanbase.DBA_OB_UNITS AS t1
JOIN oceanbase.DBA_OB_RESOURCE_POOLS AS t2 ON t1.RESOURCE_POOL_ID = t2.RESOURCE_POOL_ID
WHERE
t2.TENANT_ID IS NULL;
说明:上述查询用于定位未被租户使用的 unit。如果结果为空,表明当前集群所有的 unit 都在使用中。
查询结果:
Empty set
五、查看内存和缓存使用情况
1. 查看 MEMStore 使用情况
obclient [oceanbase]> SELECT
/*+ READ_CONSISTENCY(WEAK), query_timeout(100000000) */
TENANT_ID,
SVR_IP,
ROUND(ACTIVE_SPAN / 1024 / 1024 / 1024, 2) AS ACTIVE_GB,
ROUND(MEMSTORE_USED / 1024 / 1024 / 1024, 2) AS TOTAL_GB,
ROUND(FREEZE_TRIGGER / 1024 / 1024 / 1024, 2) AS FREEZE_TRIGGER_GB,
ROUND(MEMSTORE_USED / FREEZE_TRIGGER * 100, 2) AS percent_trigger,
ROUND(MEMSTORE_LIMIT / 1024 / 1024 / 1024, 2) AS MEM_LIMIT_GB
FROM
V$OB_MEMSTORE
WHERE
TENANT_ID > 1000
OR TENANT_ID = 1
ORDER BY
TENANT_ID,
TOTAL_GB DESC;
查询结果示例:
+-----------+-----------+-----------+----------+-------------------+-----------------+--------------+
| TENANT_ID | SVR_IP | ACTIVE_GB | TOTAL_GB | FREEZE_TRIGGER_GB | percent_trigger | MEM_LIMIT_GB |
+-----------+-----------+-----------+----------+-------------------+-----------------+--------------+
| 1 | 127.0.0.1 | 0.07 | 0.07 | 0.13 | 53.69 | 0.50 |
| 1001 | 127.0.0.1 | 0.06 | 0.06 | 0.13 | 49.11 | 0.50 |
| 1002 | 127.0.0.1 | 0.01 | 0.01 | 0.13 | 10.63 | 0.50 |
+-----------+-----------+-----------+----------+-------------------+-----------------+--------------+
说明:此查询展示了各租户 MemStore 的内存使用情况,包括活跃 MemTable 内存、总使用内存、触发冻结的内存大小等。
2. 查看各模块内存占用情况
obclient [oceanbase]> SELECT * FROM V$OB_MEMORY ORDER BY hold DESC LIMIT 10;
查询结果示例:
+-----------+-----------+----------+-------------------+-----------------+-------+-----------+-----------+
| TENANT_ID | SVR_IP | SVR_PORT | CTX_NAME | MOD_NAME | COUNT | HOLD | USED |
+-----------+-----------+----------+-------------------+-----------------+-------+-----------+-----------+
| 500 | 127.0.0.1 | 2882 | UNEXPECTED_IN_500 | FixeSizeBlocAll | 6 | 134308352 | 134226432 |
| 500 | 127.0.0.1 | 2882 | DEFAULT_CTX_ID | CACHE_INST | 2 | 113709056 | 113680448 |
| 1 | 127.0.0.1 | 2882 | CO_STACK | CoStack | 188 | 97026048 | 96844064 |
| 1002 | 127.0.0.1 | 2882 | CO_STACK | CoStack | 179 | 92381184 | 92207912 |
| 1001 | 127.0.0.1 | 2882 | CO_STACK | CoStack | 163 | 84123648 | 83965864 |
| 1002 | 127.0.0.1 | 2882 | DEFAULT_CTX_ID | LogGroupBuffer | 2 | 83927040 | 83886080 |
| 1 | 127.0.0.1 | 2882 | MEMSTORE_CTX_ID | Memstore | 36 | 74907648 | 74873376 |
| 500 | 127.0.0.1 | 2882 | CO_STACK | CoStack | 136 | 70189056 | 70057408 |
| 1001 | 127.0.0.1 | 2882 | MEMSTORE_CTX_ID | Memstore | 33 | 68665344 | 68633928 |
| 500 | 127.0.0.1 | 2882 | SCHEMA_SERVICE | SchemaSysCache | 8314 | 67840672 | 67234620 |
+-----------+-----------+----------+-------------------+-----------------+-------+-----------+-----------+
说明:结果列出了各模块的内存占用详情,
HOLD
字段表示占用内存最大值。通过分析这些数据,可以识别内存占用高的模块进行针对性优化。
3. 查看 KVCACHE 使用情况
obclient [oceanbase]> SELECT
tenant_id,
svr_ip,
cache_name,
CONCAT(ROUND(cache_size / 1024 / 1024 / 1024, 2), 'G') AS cache_size
FROM
V$OB_KVCACHE
ORDER BY
1, 2, 4 DESC;
查询结果示例:
+-----------+-----------+-----------------------+------------+
| tenant_id | svr_ip | cache_name | cache_size |
+-----------+-----------+-----------------------+------------+
| 1 | 127.0.0.1 | vtable_cache | 0.01G |
| 1 | 127.0.0.1 | index_block_cache | 0.01G |
| 1 | 127.0.0.1 | user_block_cache | 0.01G |
| 1 | 127.0.0.1 | user_row_cache | 0.01G |
| 1 | 127.0.0.1 | fuse_row_cache | 0.01G |
| 1 | 127.0.0.1 | storage_meta_cache | 0.01G |
| 1 | 127.0.0.1 | opt_table_stat_cache | 0.01G |
| 1 | 127.0.0.1 | opt_column_stat_cache | 0.01G |
| 1 | 127.0.0.1 | opt_ds_stat_cache | 0.01G |
| 1 | 127.0.0.1 | schema_cache | 0.01G |
| 1001 | 127.0.0.1 | storage_meta_cache | 0.01G |
| 1001 | 127.0.0.1 | opt_table_stat_cache | 0.01G |
| 1001 | 127.0.0.1 | fuse_row_cache | 0.01G |
| 1001 | 127.0.0.1 | opt_column_stat_cache | 0.01G |
| 1001 | 127.0.0.1 | user_row_cache | 0.01G |
| 1001 | 127.0.0.1 | user_block_cache | 0.01G |
| 1001 | 127.0.0.1 | index_block_cache | 0.01G |
| 1002 | 127.0.0.1 | opt_table_stat_cache | 0.01G |
| 1002 | 127.0.0.1 | storage_meta_cache | 0.01G |
| 1002 | 127.0.0.1 | opt_column_stat_cache | 0.01G |
| 1002 | 127.0.0.1 | fuse_row_cache | 0.01G |
| 1002 | 127.0.0.1 | bf_cache | 0.01G |
| 1002 | 127.0.0.1 | user_row_cache | 0.01G |
| 1002 | 127.0.0.1 | user_block_cache | 0.01G |
| 1002 | 127.0.0.1 | index_block_cache | 0.01G |
+-----------+-----------+-----------------------+------------+
说明:查询结果展示了各租户和不同缓存名称对应的缓存占用情况。通过分析 KVCACHE 的使用情况,可以发现各租户缓存压力,并优化缓存使用策略。
六、查询会话与系统事件信息
1. 查询活跃会话数
查询各租户下的活跃会话数:
obclient [oceanbase]> SELECT
CASE WHEN cnt IS NULL THEN 0 ELSE cnt END AS cnt,
tenant_name,
tenant_id
FROM
(
SELECT
DBA_OB_TENANTS.tenant_name,
DBA_OB_TENANTS.tenant_id,
cnt
FROM
DBA_OB_TENANTS
LEFT JOIN (
SELECT
COUNT(`state` = 'ACTIVE' OR NULL) AS cnt,
tenant AS tenant_name
FROM
GV$OB_PROCESSLIST
WHERE
svr_ip = '10.xxx.xxx.xxx'
AND svr_port = '2882'
GROUP BY
tenant
) t1 ON DBA_OB_TENANTS.tenant_name = t1.tenant_name
WHERE
DBA_OB_TENANTS.tenant_type <> 'META'
) t2;
查询结果示例:
+------+-------------+-----------+
| cnt | tenant_name | tenant_id |
+------+-------------+-----------+
| 0 | sys | 1 |
| 0 | mq_t1 | 1002 |
+------+-------------+-----------+
说明:当前系统中各租户的活跃会话数。此例中 'mq_t1' 租户没有活跃的会话。
2. 查询不同主机的实时会话数
obclient [oceanbase]> SELECT
user,
host,
state,
COUNT(*)
FROM
GV$OB_PROCESSLIST
WHERE
state <> 'SLEEP'
GROUP BY
user, host, state
ORDER BY
host
LIMIT 50;
查询结果示例:
+------+-----------------+--------+----------+
| user | host | state | count(*) |
+------+-----------------+--------+----------+
| root | 127.0.0.1:47464 | ACTIVE | 1 |
+------+-----------------+--------+----------+
说明:此查询显示了不同主机上正在进行的非空闲会话数,帮助定位会话热点。
3. 查询合并和转储相关信息
sys 租户查询所有租户的合并状态
obclient [oceanbase]> SELECT
tenant_id,
global_broadcast_scn AS broadcast_scn,
is_error AS error,
status,
frozen_scn,
last_scn,
is_suspended AS suspend,
info,
start_time,
last_finish_time
FROM
CDB_OB_MAJOR_COMPACTION;
查询结果示例:
+-----------+---------------------+-------+--------+---------------------+---------------------+---------+------+----------------------------+----------------------------+
| tenant_id | broadcast_scn | error | status | frozen_scn | last_scn | suspend | info | start_time | last_finish_time |
+-----------+---------------------+-------+--------+---------------------+---------------------+---------+------+----------------------------+----------------------------+
| 1 | 1693764004565141903 | NO | IDLE | 1693764004565141903 | 1693764004565141903 | NO | | 2023-09-04 02:00:04.596021 | 2023-09-04 02:01:46.393845 |
| 1001 | 1693764000319375220 | NO | IDLE | 1693764000319375220 | 1693764000319375220 | NO | | 2023-09-04 02:00:00.345588 | 2023-09-04 02:01:06.956234 |
| 1002 | 1693764002360017495 | NO | IDLE | 1693764002360017495 | 1693764002360017495 | NO | | 2023-09-04 02:00:02.383296 | 2023-09-04 02:01:46.740004 |
+-----------+---------------------+-------+--------+---------------------+---------------------+---------+------+----------------------------+----------------------------+
说明:此查询展示了各租户的合并状态、错误状态、冻结版本号、状态、以及最后一次合并的开始和结束时间。
普通租户下查询当前租户的合并状态
obclient [oceanbase]> SELECT
global_broadcast_scn AS broadcast_scn,
is_error AS error,
status,
frozen_scn,
last_scn,
is_suspended AS suspend,
info,
start_time,
last_finish_time
FROM
DBA_OB_MAJOR_COMPACTION;
说明:此查询适用于普通租户环境下,仅返回当前租户的合并状态信息。
获取租户的合并进度
obclient [oceanbase]> SELECT
TENANT_ID,
COMPACTION_SCN,
CONCAT(100 * (1 - SUM(UNFINISHED_TABLET_COUNT) / SUM(TOTAL_TABLET_COUNT)),'%') AS progress
FROM
GV$OB_COMPACTION_PROGRESS
GROUP BY
TENANT_ID, COMPACTION_SCN;
查询结果示例:
+-----------+---------------------+-----------+
| TENANT_ID | COMPACTION_SCN | progress |
+-----------+---------------------+-----------+
| 1 | 1689876004916198873 | 100.0000% |
+-----------+---------------------+-----------+
说明:此查询显示当前租户的合并进度,此例中进度为 100%。
查询所有租户的 tablet 转储历史信息
obclient [oceanbase]> SELECT
tenant_id,
start_time AS min_start_time,
finish_time AS max_finish_time,
SUM(occupy_size) AS occupy_size,
SUM(total_row_count) AS total_row_count,
COUNT(1) AS tablet_count
FROM
GV$OB_TABLET_COMPACTION_HISTORY
GROUP BY
tenant_id;
查询结果示例:
+-----------+----------------------------+----------------------------+-------------+-----------------+--------------+
| tenant_id | min_start_time | max_finish_time | occupy_size | total_row_count | tablet_count |
+-----------+----------------------------+----------------------------+-------------+-----------------+--------------+
| 1 | 2023-08-30 02:01:06.145283 | 2023-08-30 02:01:06.151083 | 2730712096 | 30365311 | 27312 |
+-----------+----------------------------+----------------------------+-------------+-----------------+--------------+
说明:此查询展示了各租户的 tablet 转储历史信息,包括转储开始和结束时间、占用空间、行数及 tablet 数量。
七、查询 RS(RootService)相关信息
1. 查询租户 RS 任务列表
obclient [oceanbase]> SELECT
TENANT_ID,
JOB_ID,
JOB_TYPE,
JOB_STATUS,
PROGRESS
FROM
oceanbase.DBA_OB_TENANT_JOBS
ORDER BY
JOB_ID DESC;
说明:此查询用于查看租户 RS 任务列表,以判断内核任务是否完成。特别关注
JOB_TYPE
为ALTER_TENANT_LOCALITY
或SHRINK_RESOURCE_POOL_UNIT_NUM
的任务。
2. 查询 RS 历史事件信息
obclient [oceanbase]> SELECT
/*+QUERY_TIMEOUT(60000000) */
timestamp AS gmt_create,
module,
event,
name1,
value1,
name2,
value2,
name3,
value3,
name4,
value4,
name5,
value5,
name6,
value6,
extra_info,
rs_svr_ip,
rs_svr_port
FROM
DBA_OB_ROOTSERVICE_EVENT_HISTORY
ORDER BY
timestamp DESC
LIMIT 20;
查询结果示例(部分):
+----------------------------+-----------------+-------------------------+-----------+--------+------------------------+------------------+------------+-----------------------------------+-------+--------+-------+--------+-------+--------+------------+----------------+-------------+
| gmt_create | module | event | name1 | value1 | name2 | value2 | name3 | value3 | name4 | value4 | name5 | value5 | name6 | value6 | extra_info | rs_svr_ip | rs_svr_port |
+----------------------------+-----------------+-------------------------+-----------+--------+------------------------+------------------+------------+-----------------------------------+-------+--------+-------+--------+-------+--------+------------+----------------+-------------+
| 2023-07-31 11:34:39.697359 | schema_recycler | batch_recycle_by_tenant | tenant_id | 1004 | recycle_schema_version | 1690164797923912 | cost | 61498 | | | | | | | | 10.xxx.xxx.xxx | 2882 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+----------------------------+-----------------+-------------------------+-----------+--------+------------------------+------------------+------------+-----------------------------------+-------+--------+-------+--------+-------+--------+------------+----------------+-------------+
说明:此查询展示了 RS 历史事件信息,可以用于审计和排查系统事件。
八、权限查询
1. 查询用户列表与全局权限授权情况
obclient [oceanbase]> SELECT
`user`,
(CASE `account_locked` WHEN 'Y' THEN 1 ELSE 0 END) AS account_locked,
`select_priv`,
`insert_priv`,
`update_priv`,
`delete_priv`,
`create_priv`,
`drop_priv`,
`process_priv`,
`grant_priv`,
`index_priv`,
`alter_priv`,
`show_db_priv`,
`super_priv`,
`create_view_priv`,
`show_view_priv`,
`create_user_priv`,
`password`
FROM
`mysql`.`user`;
查询结果示例:
+-------------+----------------+-------------+-------------+-------------+-------------+-------------+-----------+--------------+------------+------------+------------+--------------+------------+------------------+----------------+------------------+----------+
| user | account_locked | select_priv | insert_priv | update_priv | delete_priv | create_priv | drop_priv | process_priv | grant_priv | index_priv | alter_priv | show_db_priv | super_priv | create_view_priv | show_view_priv | create_user_priv | password |
+-------------+----------------+-------------+-------------+-------------+-------------+-------------+-----------+--------------+------------+------------+------------+--------------+------------+------------------+----------------+------------------+----------+
| root | 0 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| proxyro | 0 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | |
| ocp_monitor | 0 | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | |
+-------------+----------------+-------------+-------------+-------------+-------------+-------------+-----------+--------------+------------+------------+------------+--------------+------------+------------------+----------------+------------------+----------+
2. 查询数据库权限授权情况
obclient [oceanbase]> SELECT
`db`,
`user`,
`select_priv`,
`insert_priv`,
`update_priv`,
`delete_priv`,
`create_priv`,
`drop_priv`,
`index_priv`,
`alter_priv`,
`create_view_priv`,
`show_view_priv`
FROM
`mysql`.`db`;
查询结果示例:
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+------------+------------+------------------+----------------+
| db | user | select_priv | insert_priv | update_priv | delete_priv | create_priv | drop_priv | index_priv | alter_priv | create_view_priv | show_view_priv |
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+------------+------------+------------------+----------------+
| mysql | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| oceanbase | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| test | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| information_schema | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| __public | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| __recyclebin | root | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| oceanbase | proxyro | Y | N | N | N | N | N | N | N | N | N |
| oceanbase | ocp_monitor | Y | N | N | N | N | N | N | N | N | N |
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+------------+------------+------------------+----------------+
说明:通过此查询可以查看各数据库下不同用户的权限分配情况。
总结
通过上述各类 SQL 查询,管理员可以获取 OceanBase 集群的详细性能、资源使用、会话、合并状态、内存与缓存使用以及权限分配情况。基于这些数据,可以全面地分析集群的运行状况,识别潜在瓶颈,并采取相应的优化措施,以确保集群高效稳定运行。