diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md index ac8e6c0da..b31477b4f 100644 --- a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -275,7 +275,7 @@ On each server, create two YML files: `confignode.yml` and `datanode.yml`. Examp version: "3" services: iotdb-confignode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -310,7 +310,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 3bafe066a..000000000 --- a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# Monitoring Panel Deployment - -The monitoring panel is one of the supporting tools for IoTDB. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring metrics, in order to help users monitor cluster health, optimize performance, and perform maintenance. This guide demonstrates how to enable the system monitoring module in a IoTDB instance and visualize monitoring metrics using Prometheus + Grafana, using a typical 3C3D cluster (3 ConfigNodes and 3 DataNodes) as an example. - -## 1. Installation Preparation - -1. Installing IoTDB: Install IoTDB V1.0 or above. Contact sales or technical support to obtain the installation package. - -2. Obtain the monitoring panel installation package: The monitoring panel is exclusive to the enterprise-grade IoTDB. Contact sales or technical support to obtain it. - -## 2. Installation Steps - -### 2.1 Enable Monitoring Metrics Collection in IoTDB - -1. Enable related configuration options. The configuration options related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to enable certain configuration options (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | **Configuration File** | **Description** | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-confignode.properties | Uncomment the configuration option and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-confignode.properties | Uncomment the configuration option and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-confignode.properties | Uncomment the configuration option and keep the default port `9091` or set another port (ensure no conflict) | -| dn_metric_reporter_list | conf/iotdb-datanode.properties | Uncomment the configuration option and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-datanode.properties | Uncomment the configuration option and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-datanode.properties | Uncomment the configuration option and keep the default port `9092` or set another port (ensure no conflict) | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | ---------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring configurations on all 3 nodes, restart the ConfigNode and DataNode services: - -```Bash - # Unix/OS X - ./sbin/stop-standalone.sh #Stop confignode and datanode first - ./sbin/start-confignode.sh -d #Start confignode - ./sbin/start-datanode.sh -d #Start datanode - - # Windows - # Before version V2.0.4.x - .\sbin\stop-standalone.bat - .\sbin\start-confignode.bat - .\sbin\start-datanode.bat - - # V2.0.4.x and later versions - .\sbin\windows\stop-standalone.bat - .\sbin\windows\start-confignode.bat - .\sbin\windows\start-datanode.bat - ``` - -3. After restarting, confirm the running status of each node through the client. If all nodes are running, the configuration is successful. - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 Install and Configure Prometheus - -> In this example, Prometheus is installed on server 192.168.1.3. - -1. Download Prometheus (version 2.30.3 or later). You can download it on Prometheus homepage (https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the folder: - -```Shell - tar xvfz prometheus-*.tar.gz - cd prometheus-* - ``` - -3. Modify the configuration. Modify the configuration file `prometheus.yml` as follows - - Add a confignode job to collect monitoring data for ConfigNode - - Add a datanode job to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell - ./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d - ``` - -5. Confirm successful startup. Open a browser and navigate to http://192.168.1.3:9090 . Navitage to "Status" -> "Targets". If the states of all targets were up, the configuration is successful. - -
- - -
- -6. Click the links in the `Targets` page to view monitoring information for the respective nodes. - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 Install Grafana and Configure the Data Source - -> n this example, Grafana is installed on server 192.168.1.3. - -1. Download Grafana (version 8.4.2 or later). You can download it on Grafana homepage (https://grafana.com/grafana/download) - -2. 2. Unzip the installation package and enter the folder: - -```Shell - tar -zxvf grafana-*.tar.gz - cd grafana-* - ``` - -3. Start Grafana: - -```Shell - ./bin/grafana-server web - ``` - -4. Log in to Grafana. Open a browser and navigate to `http://192.168.1.3:3000` (or the modified port). The default initial username and password are both `admin`. - -5. Configure data sources. Navigate to "Connections" -> "Data sources", add a new data source, and add`Prometheus`as data source. - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -Ensure the URL for Prometheus is correct. Click "Save & Test". If the message "Data source is working" appears, the configuration is successful. - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Choose one of the JSON files (e.g., `Apache IoTDB ConfigNode Dashboard`). - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Choose Prometheus as the data source and click "Import" - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. The imported `Apache IoTDB ConfigNode Dashboard` will now be displayed. - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, import other dashboards such as `Apache IoTDB DataNode Dashboard`, `Apache Performance Overview Dashboard`, and `Apache System Overview Dashboard`. - -
- - - -
- -8. The IoTDB monitoring panel is now fully imported, and you can view monitoring information at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. Appendix, Detailed Explanation of Monitoring Indicators - -### 3.1 System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Cores:CPU cores -- CPU Utilization: - - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. - - Total Physical Memory:The total amount of available physical memory in the system. - - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total Disk Space:The maximum disk space that IoTDB can use. - - Used Disk Space:The disk space already used by IoTDB. -- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - All:All file quantities - - TsFile:Number of TsFiles - - Seq:Number of sequential TsFiles - - Unseq:Number of unsequence TsFiles - - WAL:Number of WAL files - - Cross-Temp:Number of cross space merge temp files - - Inner-Seq-Temp:Number of merged temp files in sequential space - - Innsr-Unseq-Temp:Number of merged temp files in unsequential space - - Mods:Number of tombstone files -- Open File Handles:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off-Heap Memory:Out of heap memory usage. - - Direct Memory:Out of heap direct memory. - - Mapped Memory:Out of heap mapped memory. -- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute -- Loaded & Unloaded Classes: - - Loaded:The number of classes currently loaded by the JVM - - Unloaded:The number of classes uninstalled by the JVM since system startup -- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Network Speed:The speed of network card sending and receiving data -- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart -- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) - -### 3.2 Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Cores:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster -- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster -- Total Files:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBases: The total number of databases managed by the cluster (including replicas) -- Total DataRegions: The total number of DataRegions managed by the cluster -- Total SchemaRegions: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Cores: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Time Series: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Throughput: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Count: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections -- Operation Latency: The time consumption of various types of node operations, including average and P99 -- Average Interface Latency: The average time consumption of each thrust interface of a node -- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes -- Total Tasks: The number of system tasks for each node -- Average Task Latency: The average time spent on various system tasks of a node -- P99 Task Latency: P99 time consumption for various system tasks of nodes -- Operations Per Second: The number of operations per second for a node -- Mainstream Process - - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process - - Average Stage Latency: The average time consumption of each stage in the main process of a node - - P99 Stage Latency: P99 time consumption for each stage of the node's main process -- Schedule Stage - - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage - - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage - - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node - - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage - - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage - - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage - - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node - - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Utilization: CPU load of nodes -- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off-Heap Memory: Non heap memory usage of nodes -- Total Java Threads: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Logs Per Minute: Different types of logs per minute for nodes - -### 3.3 ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Status: The state of the SchemeRegion of the node -- System Memory Utilization: The system memory size of the node -- Swap Memory Utilization: Node's swap memory size -- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located -- DataNodes Status:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node -- Client time situation - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Latency: The time consumption of each stage of the node's Ratis -- Write Log Entry Latency: The time required to write a log for the Ratis of a node -- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write Throughput: Remote and local QPS written to node Ratis -- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol - -### 3.4 DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- Total Managed Entities: Entity situation of node management -- Write Throughput: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - Average Operation Latency: The average time spent on various operations of a node - - P50 Operation Latency: The median time spent on various operations of a node - - P99 Operation Latency: P99 time consumption for various operations of nodes -- Thrift Statistics - - Thrift Interface QPS: QPS of various Thrift interfaces of nodes - - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node - - Thrift Connections: The number of Thrfit connections of each type of node - - Active Thrift Threads: The number of active Thrift connections for each type of node -- Client Statistics - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - Total TsFile Size Per Level: The total size of TsFile files at each level of node management - - TsFile Count Per Level: Number of TsFile files at each level of node management - - Average TsFile Size Per Level: The average size of TsFile files at each level of node management -- Total Tasks: Number of Tasks for Nodes -- Task Latency: The time consumption of tasks for nodes -- Compaction - - Compaction Read/Write Throughput: The merge read and write speed of nodes per second - - Compactions Per Minute: The number of merged nodes per minute - - Compaction Chunk Status: The number of Chunks in different states merged by nodes - - Compacted-Points Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Average Write Latency: Average node write time, including writing wal and memtable -- P50 Write Latency: Median node write time, including writing wal and memtable -- P99 Write Latency: P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL Files:Number of WAL files managed by nodes - - WAL Nodes: Number of WAL nodes managed by nodes - - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes - - WAL Serialization Time (Total): Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage - - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage - - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage - - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - Average Query Plan Execution Time: The average time spent on node queries at each stage - - P50 Query Plan Execution Time: Median time spent on node queries at each stage - - P99 Query Plan Execution Time: P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution - - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution - - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time -- Execution Plan Execution Time - - Average Query Execution Time: The average execution time of node query execution plan - - P50 Query Execution Time:Median execution time of node query execution plan - - P99 Query Execution Time: P99 of node query execution plan execution time -- Operator Execution Time - - Average Query Operator Execution Time: The average execution time of node query operators - - P50 Query Operator Execution Time: Median execution time of node query operator - - P99 Query Operator Execution Time: P99 of node query operator execution time -- Aggregation Query Computation Time - - Average Query Aggregation Execution Time: The average computation time for node aggregation queries - - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries - - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes - - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes - - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - Average Query Resource Utilization: The average number of resource visits for node queries - - P50 Query Resource Utilization: Median number of resource visits for node queries - - P99 Query Resource Utilization: P99 for node query resource access quantity -- Data Transmission Time - - Average Query Data Exchange Latency: The average time spent on node query data transmission - - P50 Query Data Exchange Latency: Median query data transmission time for nodes - - P99 Query Data Exchange Latency: P99 for node query data transmission time -- Number Of Data Transfers - - Average Query Data Exchange Count: The average number of data transfers queried by nodes - - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - Query Queue Length: Node query task scheduling quantity - - Average Query Scheduling Latency: The average time spent on scheduling node query tasks - - P50 Query Scheduling Latency: Median time spent on node query task scheduling - - P99 Query Scheduling Latency: P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata - - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries - - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata -- Read Time Series - - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series - - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series - - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series -- Modify Time Series Metadata - - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata - - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes - - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists - - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list - - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata - - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries - - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata - - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata - - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries - - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries - - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - Average Chunk Read Time: The average time taken for node queries to read Chunks - - P50 Chunk Read Time: Median time spent querying nodes to read Chunks - - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries - - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries - - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader - - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries - - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader - - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries - - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle - - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle - - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle - - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle - - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle - - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle - - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event - - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event - - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event -- Get Data Block Tasks - - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks - - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks - - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler Count: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage: Node Ratis Memory Usage diff --git a/src/UserGuide/Master/Table/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/Master/Table/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/Master/Table/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/Master/Table/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/Master/Table/Reference/System-Tables_apache.md b/src/UserGuide/Master/Table/Reference/System-Tables_apache.md index 702a9c15a..1ffc07358 100644 --- a/src/UserGuide/Master/Table/Reference/System-Tables_apache.md +++ b/src/UserGuide/Master/Table/Reference/System-Tables_apache.md @@ -524,7 +524,6 @@ IoTDB> select * from information_schema.keywords limit 10 | internal\_port | INT32 | ATTRIBUTE | Internal port | | version | STRING | ATTRIBUTE | Version number | | build\_info | STRING | ATTRIBUTE | Commit ID | -| activate\_status (Enterprise Edition only) | STRING | ATTRIBUTE | Activation status | * Only administrators are allowed to perform operations on this table. * Query example: diff --git a/src/UserGuide/Master/Table/User-Manual/Load-Balance.md b/src/UserGuide/Master/Table/User-Manual/Load-Balance.md index ef69e1def..cfc42679f 100644 --- a/src/UserGuide/Master/Table/User-Manual/Load-Balance.md +++ b/src/UserGuide/Master/Table/User-Manual/Load-Balance.md @@ -211,7 +211,7 @@ Total line number = 4 It costs 0.110s ``` -7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. Otherwise, you will need to contact the support team to reapply for activation code information. +7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. #### 1.3.3 Manual Load Balancing (Optional) diff --git a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md index 7ba937040..e3b6b59b4 100644 --- a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -274,7 +274,7 @@ On each server, two yml files need to be written, namely confignnode. yml and da version: "3" services: iotdb-confignode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -309,7 +309,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 41c28734c..000000000 --- a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# Monitoring Panel Deployment - -The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. - -The instructions for using the monitoring panel tool can be found in the [Instructions](../Tools-System/Monitor-Tool.md) section of the document. - -## 1. Installation Preparation - -1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain -2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain - -## 2. Installation Steps - -### 2.1 IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | Located in the configuration file | **Description** | -| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | -| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - -```Bash -# Unix/OS X -./sbin/stop-standalone.sh #Stop confignode and datanode first -./sbin/start-confignode.sh -d #Start confignode -./sbin/start-datanode.sh -d #Start datanode - -# Windows -# Before version V2.0.4.x -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x and later versions -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 Install and configure Prometheus - -> Taking Prometheus installed on server 192.168.1.3 as an example. - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- -6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 Install Grafana and configure the data source - -> Taking Grafana installed on server 192.168.1.3 as an example. - -1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) -2. Unzip and enter the corresponding folder - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. Start Grafana: - -```Shell -./bin/grafana-server web -``` - -4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. - -5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Select Prometheus as the data source and click Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: - -
- - - -
- -8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. Appendix, Detailed Explanation of Monitoring Indicators - -### 3.1 System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Cores:CPU cores -- CPU Utilization: - - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. - - Total Physical Memory:The total amount of available physical memory in the system. - - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total Disk Space:The maximum disk space that IoTDB can use. - - Used Disk Space:The disk space already used by IoTDB. -- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - All:All file quantities - - TsFile:Number of TsFiles - - Seq:Number of sequential TsFiles - - Unseq:Number of unsequence TsFiles - - WAL:Number of WAL files - - Cross-Temp:Number of cross space merge temp files - - Inner-Seq-Temp:Number of merged temp files in sequential space - - Innsr-Unseq-Temp:Number of merged temp files in unsequential space - - Mods:Number of tombstone files -- Open File Handles:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off-Heap Memory:Out of heap memory usage. - - Direct Memory:Out of heap direct memory. - - Mapped Memory:Out of heap mapped memory. -- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute -- Loaded & Unloaded Classes: - - Loaded:The number of classes currently loaded by the JVM - - Unloaded:The number of classes uninstalled by the JVM since system startup -- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Network Speed:The speed of network card sending and receiving data -- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart -- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) - -### 3.2 Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Cores:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster -- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster -- Total Files:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBases: The total number of databases managed by the cluster (including replicas) -- Total DataRegions: The total number of DataRegions managed by the cluster -- Total SchemaRegions: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Cores: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Time Series: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Throughput: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Count: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections -- Operation Latency: The time consumption of various types of node operations, including average and P99 -- Average Interface Latency: The average time consumption of each thrust interface of a node -- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes -- Total Tasks: The number of system tasks for each node -- Average Task Latency: The average time spent on various system tasks of a node -- P99 Task Latency: P99 time consumption for various system tasks of nodes -- Operations Per Second: The number of operations per second for a node -- Mainstream Process - - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process - - Average Stage Latency: The average time consumption of each stage in the main process of a node - - P99 Stage Latency: P99 time consumption for each stage of the node's main process -- Schedule Stage - - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage - - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage - - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node - - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage - - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage - - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage - - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node - - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Utilization: CPU load of nodes -- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off-Heap Memory: Non heap memory usage of nodes -- Total Java Threads: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Logs Per Minute: Different types of logs per minute for nodes - -### 3.3 ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Status: The state of the SchemeRegion of the node -- System Memory Utilization: The system memory size of the node -- Swap Memory Utilization: Node's swap memory size -- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located -- DataNodes Status:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node -- Client time situation - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Latency: The time consumption of each stage of the node's Ratis -- Write Log Entry Latency: The time required to write a log for the Ratis of a node -- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write Throughput: Remote and local QPS written to node Ratis -- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol - -### 3.4 DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- Total Managed Entities: Entity situation of node management -- Write Throughput: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - Average Operation Latency: The average time spent on various operations of a node - - P50 Operation Latency: The median time spent on various operations of a node - - P99 Operation Latency: P99 time consumption for various operations of nodes -- Thrift Statistics - - Thrift Interface QPS: QPS of various Thrift interfaces of nodes - - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node - - Thrift Connections: The number of Thrfit connections of each type of node - - Active Thrift Threads: The number of active Thrift connections for each type of node -- Client Statistics - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - Total TsFile Size Per Level: The total size of TsFile files at each level of node management - - TsFile Count Per Level: Number of TsFile files at each level of node management - - Average TsFile Size Per Level: The average size of TsFile files at each level of node management -- Total Tasks: Number of Tasks for Nodes -- Task Latency: The time consumption of tasks for nodes -- Compaction - - Compaction Read/Write Throughput: The merge read and write speed of nodes per second - - Compactions Per Minute: The number of merged nodes per minute - - Compaction Chunk Status: The number of Chunks in different states merged by nodes - - Compacted-Points Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Average Write Latency: Average node write time, including writing wal and memtable -- P50 Write Latency: Median node write time, including writing wal and memtable -- P99 Write Latency: P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL Files:Number of WAL files managed by nodes - - WAL Nodes: Number of WAL nodes managed by nodes - - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes - - WAL Serialization Time (Total): Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage - - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage - - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage - - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - Average Query Plan Execution Time: The average time spent on node queries at each stage - - P50 Query Plan Execution Time: Median time spent on node queries at each stage - - P99 Query Plan Execution Time: P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution - - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution - - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time -- Execution Plan Execution Time - - Average Query Execution Time: The average execution time of node query execution plan - - P50 Query Execution Time:Median execution time of node query execution plan - - P99 Query Execution Time: P99 of node query execution plan execution time -- Operator Execution Time - - Average Query Operator Execution Time: The average execution time of node query operators - - P50 Query Operator Execution Time: Median execution time of node query operator - - P99 Query Operator Execution Time: P99 of node query operator execution time -- Aggregation Query Computation Time - - Average Query Aggregation Execution Time: The average computation time for node aggregation queries - - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries - - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes - - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes - - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - Average Query Resource Utilization: The average number of resource visits for node queries - - P50 Query Resource Utilization: Median number of resource visits for node queries - - P99 Query Resource Utilization: P99 for node query resource access quantity -- Data Transmission Time - - Average Query Data Exchange Latency: The average time spent on node query data transmission - - P50 Query Data Exchange Latency: Median query data transmission time for nodes - - P99 Query Data Exchange Latency: P99 for node query data transmission time -- Number Of Data Transfers - - Average Query Data Exchange Count: The average number of data transfers queried by nodes - - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - Query Queue Length: Node query task scheduling quantity - - Average Query Scheduling Latency: The average time spent on scheduling node query tasks - - P50 Query Scheduling Latency: Median time spent on node query task scheduling - - P99 Query Scheduling Latency: P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata - - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries - - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata -- Read Time Series - - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series - - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series - - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series -- Modify Time Series Metadata - - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata - - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes - - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists - - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list - - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata - - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries - - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata - - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata - - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries - - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries - - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - Average Chunk Read Time: The average time taken for node queries to read Chunks - - P50 Chunk Read Time: Median time spent querying nodes to read Chunks - - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries - - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries - - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader - - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries - - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader - - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries - - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle - - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle - - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle - - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle - - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle - - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle - - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event - - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event - - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event -- Get Data Block Tasks - - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks - - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks - - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler Count: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage: Node Ratis Memory Usage diff --git a/src/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md b/src/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md index 9a22a1bf4..6a7ced920 100644 --- a/src/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md +++ b/src/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md @@ -43,12 +43,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | Version not required, please refer to [Deployment Guidance](../QuickStart/QuickStart_apache.md) | | JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | -| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact staff) | -| DataEase-IoTDB Connector | Please contact staff for assistance | +| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported) | +| DataEase-IoTDB Connector | Obtain the installation package | ## 3. Installation Steps -Step 1: Please contact staff to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` +Step 1: Unzip the installation package `iotdb-api-source-1.0.0.zip` Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder diff --git a/src/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md index adb1f172e..5a2688cac 100644 --- a/src/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md +++ b/src/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :---------------------------------------- | :----------------------------------------------------------- | | JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | | IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package.md) | -| ThingsBoard
(IoTDB adapted version) | Please contact commercial support to obtain the installation package. Detailed installation steps are provided below. | +| ThingsBoard
(IoTDB adapted version) | Obtain the installation package. Detailed installation steps are provided below. | ## 3. Installation Steps Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the obtained installation package to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. - [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content ```Shell diff --git a/src/UserGuide/Master/Tree/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/Master/Tree/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/Master/Tree/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/Master/Tree/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md b/src/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md index c22a1cf9b..664f104ad 100644 --- a/src/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md +++ b/src/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md @@ -29,10 +29,10 @@ Based on the ability of user-defined functions, IoTDB provides a series of funct 1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3 and above |Please contact staff for assistance | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact staff for assistance| + | UDF installation package | Supported IoTDB versions | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3 and above | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster 3. In the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute the corresponding function registration statement as follows. diff --git a/src/UserGuide/Master/Tree/User-Manual/Load-Balance.md b/src/UserGuide/Master/Tree/User-Manual/Load-Balance.md index c9d215c0a..355c06f2d 100644 --- a/src/UserGuide/Master/Tree/User-Manual/Load-Balance.md +++ b/src/UserGuide/Master/Tree/User-Manual/Load-Balance.md @@ -211,7 +211,7 @@ Total line number = 4 It costs 0.110s ``` -7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. Otherwise, you will need to contact the support team to reapply for activation code information. +7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. #### 1.3.3 Manual Load Balancing (Optional) diff --git a/src/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md b/src/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md deleted file mode 100644 index 4f0ff68bd..000000000 --- a/src/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md +++ /dev/null @@ -1,158 +0,0 @@ - - -# Monitoring Board Install and Deploy -From the Apache IoTDB 1.0 version, we introduced the system monitoring module, you can complete the Apache IoTDB important operational indicators for monitoring, this article describes how to open the system monitoring module in the Apache IoTDB distribution, and the use of Prometheus + Grafana way to complete the visualisation of the system monitoring indicators. - -## pre-preparation - -### software requirement - -1. Apache IoTDB: version 1.0 and above, download from the official website: https://iotdb.apache.org/Download/ -2. Prometheus: version 2.30.3 and above, download from the official website: https://prometheus.io/download/ -3. Grafana: version 8.4.2 and above, download from the official website: https://grafana.com/grafana/download -4. IoTDB-Grafana installer: Grafana Dashboards is an IoTDB(Enterprise Edition based on IoTDB) tool, and you may contact your sales for the relevant installer. - -### cluster requirement - -Make sure that the IoTDB cluster is started before doing the following. - -### clarification - -This doc will build the monitoring dashboard on one machine (1 ConfigNode and 1 DataNode) environment, other cluster configurations are similar, users can adjust the configuration according to their own cluster situation (the number of ConfigNode and DataNode). The basic configuration information of the cluster built in this paper is shown in the table below. - -| NODETYPE | NODEIP | Monitor Pusher | Monitor Level | Monitor Port | -| ---------- | --------- | -------------- | ------------ | --------- | -| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9091 | -| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9093 | - -## configure Prometheus capture monitoring metrics - -1. Download the installation package. Download the Prometheus binary package locally, unzip it and go to the corresponding folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -2. Modify the configuration. Modify the Prometheus configuration file prometheus.yml as follows: - a. Added confignode task to collect monitoring data from ConfigNode - b. Add datanode task to collect monitoring data from DataNode - -```YAML -global: - scrape_interval: 15s - -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["localhost:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["localhost:9093"] - honor_labels: true -``` - -3. Start Promethues. the default expiration time for Prometheus monitoring data is 15d. in production environments, it is recommended to adjust the expiration time to 180d or more in order to track historical monitoring data for a longer period of time, as shown in the following startup command: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -4. Confirm the startup is successful. Enter http://localhost:9090 in the browser to enter Prometheus, click to enter the Target interface under Status (Figure 1 below), when you see State are Up, it means the configuration is successful and connected (Figure 2 below), click the link on the left side to jump to the webpage monitoring. - -![](/img/1a.png) -![](/img/2a.png) - - - -## Using Grafana to View Monitoring Data - -### Step1:Grafana Installation, Configuration and Startup - -1. Download the binary package of Grafana locally, unzip it and go to the corresponding folder: - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -2. Start Grafana and enter: - -```Shell -./bin/grafana-server web -``` - -3. Enter http://localhost:3000 in your browser to access Grafana, the default initial username and password are both admin. -4. First we configure the Data Source in Configuration to be Prometheus. - -![](/img/3a.png) - -5. When configuring the Data Source, pay attention to the URL where Prometheus is located, and click Save & Test after configuration, the Data source is working prompt appears, then the configuration is successful. - -![](/img/4a.png) - -### Step2:Use the official Grafana dashboard provided by IoTDB - -1. Enter Grafana,click Browse of Dashboards - -![](/img/5a.png) - -2. Click the Import button on the right - -![](/img/6a.png) - -3. Select a way to import Dashboard - a. Upload the Json file of the downloaded Dashboard locally - b. Enter the URL or ID of the Dashboard obtained from the Grafana website - c. Paste the contents of the Dashboard's Json file - -![](/img/7a.png) - -4. Select Prometheus in the Dashboard as the Data Source you just configured and click Import - -![](/img/8a.png) - -5. Then enter Dashboard,select job to be ConfigNode,then following monitoring dashboard will be seen: - -![](/img/9a.png) - -6. Similarly, we can import the Apache DataNode Dashboard, select job as DataNode,then following monitoring dashboard will be seen: - -![](/img/10a.png) - -### Step3:Creating a new Dashboard for data visualisation - -1. First create the Dashboard, then create the Panel. - -![](/img/11a.png) - -2. After that, you can visualize the monitoring-related data in the panel according to your needs (all relevant monitoring metrics can be filtered by selecting confignode/datanode in the job first). - -![](/img/12a.png) - -3. Once the visualisation of the monitoring metrics selected for attention is complete, we get a panel like this: - -![](/img/13a.png) \ No newline at end of file diff --git a/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md b/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md index a634210f9..55d90da81 100644 --- a/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md +++ b/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md @@ -29,9 +29,9 @@ Based on the ability of user-defined functions, IoTDB provides a series of funct 1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - | UDF libraries version | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | IoTDB-UDF-1.2.x.zip | V1.0.0~V1.2.x |Please contact staff for assistance| + | UDF libraries version | Supported IoTDB versions | + | --------------- | ----------------- | + | IoTDB-UDF-1.2.x.zip | V1.0.0~V1.2.x | 2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster 3. In the SQL command line terminal (CLI) or visualization console (Workbench) SQL operation interface of IoTDB, execute the corresponding function registration statement as follows. diff --git a/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md index 5b2204c4a..5fda11868 100644 --- a/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -274,7 +274,7 @@ On each server, two yml files need to be written, namely confignnode. yml and da version: "3" services: iotdb-confignode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used + image: apache/iotdb:1.3.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -309,7 +309,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used + image: apache/iotdb:1.3.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] @@ -430,4 +430,4 @@ Step 3: Restart IoTDB on 3 servers cd /docker-iotdb docker-compose -f confignode.yml up -d docker-compose -f datanode.yml up -d -``` \ No newline at end of file +``` diff --git a/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 17fced6e9..000000000 --- a/src/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,682 +0,0 @@ - -# Monitoring Panel Deployment - -The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. - -The instructions for using the monitoring panel tool can be found in the [Instructions](../Tools-System/Monitor-Tool.md) section of the document. - -## Installation Preparation - -1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain -2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain - -## Installation Steps - -### Step 1: IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | Located in the configuration file | **Description** | -| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | -| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - -```Bash -./sbin/stop-standalone.sh #Stop confignode and datanode first -./sbin/start-confignode.sh -d #Start confignode -./sbin/start-datanode.sh -d #Start datanode -``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### Step 2: Install and configure Prometheus - -> Taking Prometheus installed on server 192.168.1.3 as an example. - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- -6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### Step 3: Install Grafana and configure the data source - -> Taking Grafana installed on server 192.168.1.3 as an example. - -1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) -2. Unzip and enter the corresponding folder - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. Start Grafana: - -```Shell -./bin/grafana-server web -``` - -4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. - -5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### Step 4: Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Select Prometheus as the data source and click Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: - -
- - - -
- -8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## Appendix, Detailed Explanation of Monitoring Indicators - -### System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Core:CPU cores -- CPU Load: - - System CPU Load:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited vm size: The size of virtual memory allocated by the operating system to running processes. - - Total physical memory:The total amount of available physical memory in the system. - - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total disk space:The maximum disk space that IoTDB can use. - - Used disk space:The disk space already used by IoTDB. -- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - all:All file quantities - - TsFile:Number of TsFiles - - seq:Number of sequential TsFiles - - unseq:Number of unsequence TsFiles - - wal:Number of WAL files - - cross-temp:Number of cross space merge temp files - - inner-seq-temp:Number of merged temp files in sequential space - - innser-unseq-temp:Number of merged temp files in unsequential space - - mods:Number of tombstone files -- Open File Count:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off Heap Memory:Out of heap memory usage. - - direct memory:Out of heap direct memory. - - mapped memory:Out of heap mapped memory. -- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute -- The Number of Class: - - loaded:The number of classes currently loaded by the JVM - - unloaded:The number of classes uninstalled by the JVM since system startup -- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Net Speed:The speed of network card sending and receiving data -- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart -- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) - -### Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Core:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster -- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster -- Total File Number:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBase: The total number of databases managed by the cluster (including replicas) -- Total DataRegion: The total number of DataRegions managed by the cluster -- Total SchemaRegion: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Core: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Timeseries: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Number: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections -- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 -- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node -- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes -- Task Number: The number of system tasks for each node -- Average Time Consumed of Task: The average time spent on various system tasks of a node -- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes -- Operation Per Second: The number of operations per second for a node -- Mainstream Process - - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process - - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node - - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process -- Schedule Stage - - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage - - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage - - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node - - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage - - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage - - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage - - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node - - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Load: CPU load of nodes -- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off Heap Memory: Non heap memory usage of nodes -- The Number Of Java Thread: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Log Number Per Minute: Different types of logs per minute for nodes - -### ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Current Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Current Status: The state of the SchemeRegion of the node -- System Memory: The system memory size of the node -- Swap Memory: Node's swap memory size -- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located -- DataNodes:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count: Number of borrowed clients in each thread pool of the node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node -- Client time situation - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Time: The time consumption of each stage of the node's Ratis -- Write Log Entry: The time required to write a log for the Ratis of a node -- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write QPS: Remote and local QPS written to node Ratis -- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol - -### DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- The Number Of Entity: Entity situation of node management -- Write Point Per Second: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - The Time Consumed Of Operation (avg): The average time spent on various operations of a node - - The Time Consumed Of Operation (50%): The median time spent on various operations of a node - - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes -- Thrift Statistics - - The QPS Of Interface: QPS of various Thrift interfaces of nodes - - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node - - Thrift Connection: The number of Thrfit connections of each type of node - - Thrift Active Thread: The number of active Thrift connections for each type of node -- Client Statistics - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count:Number of borrowed clients for each thread pool of a node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management - - TsFile Count In Each Level: Number of TsFile files at each level of node management - - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management -- Task Number: Number of Tasks for Nodes -- The Time Consumed of Task: The time consumption of tasks for nodes -- Compaction - - Compaction Read And Write Per Second: The merge read and write speed of nodes per second - - Compaction Number Per Minute: The number of merged nodes per minute - - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes - - Compacted Point Num Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Write Cost(avg): Average node write time, including writing wal and memtable -- Write Cost(50%): Median node write time, including writing wal and memtable -- Write Cost(99%): P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL File Num:Number of WAL files managed by nodes - - WAL Nodes Num: Number of WAL nodes managed by nodes - - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes - - WAL Serialize Total Cost: Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage - - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage - - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage - - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size Of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - The time consumed of query plan stages(avg): The average time spent on node queries at each stage - - The time consumed of query plan stages(50%): Median time spent on node queries at each stage - - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time -- Execution Plan Execution Time - - The time consumed of query execution stages(avg): The average execution time of node query execution plan - - The time consumed of query execution stages(50%):Median execution time of node query execution plan - - The time consumed of query execution stages(99%): P99 of node query execution plan execution time -- Operator Execution Time - - The time consumed of operator execution stages(avg): The average execution time of node query operators - - The time consumed of operator execution(50%): Median execution time of node query operator - - The time consumed of operator execution(99%): P99 of node query operator execution time -- Aggregation Query Computation Time - - The time consumed of query aggregation(avg): The average computation time for node aggregation queries - - The time consumed of query aggregation(50%): Median computation time for node aggregation queries - - The time consumed of query aggregation(99%): P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes - - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes - - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - The usage of query resource(avg): The average number of resource visits for node queries - - The usage of query resource(50%): Median number of resource visits for node queries - - The usage of query resource(99%): P99 for node query resource access quantity -- Data Transmission Time - - The time consumed of query data exchange(avg): The average time spent on node query data transmission - - The time consumed of query data exchange(50%): Median query data transmission time for nodes - - The time consumed of query data exchange(99%): P99 for node query data transmission time -- Number Of Data Transfers - - The count of Data Exchange(avg): The average number of data transfers queried by nodes - - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - The number of query queue: Node query task scheduling quantity - - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks - - The time consumed of query schedule time(50%): Median time spent on node query task scheduling - - The time consumed of query schedule time(99%): P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata - - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries - - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata -- Read Time Series - - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series - - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series - - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series -- Modify Time Series Metadata - - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata - - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes - - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists - - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list - - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata - - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries - - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata - - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata - - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks - - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks - - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader - - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries - - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries - - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle - - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle - - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle - - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle - - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event - - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event - - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event -- Get Data Block Tasks - - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks - - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks - - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md b/src/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md index 13bb431f3..8e673c5c6 100644 --- a/src/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md +++ b/src/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md @@ -43,12 +43,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | Version not required, please refer to [Deployment Guidance](../QuickStart/QuickStart_apache.md) | | JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | -| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact staff) | -| DataEase-IoTDB Connector | Please contact staff for assistance | +| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported) | +| DataEase-IoTDB Connector | Obtain the installation package | ## Installation Steps -Step 1: Please contact staff to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` +Step 1: Unzip the installation package `iotdb-api-source-1.0.0.zip` Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder diff --git a/src/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md index b5e580d9d..9d573b90e 100644 --- a/src/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md +++ b/src/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :---------------------------------------- | :----------------------------------------------------------- | | JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | | IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package.md) | -| ThingsBoard
(IoTDB adapted version) | Please contact commercial support to obtain the installation package. Detailed installation steps are provided below. | +| ThingsBoard
(IoTDB adapted version) | Obtain the installation package. Detailed installation steps are provided below. | ## Installation Steps Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the obtained installation package to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. - [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content ```Shell diff --git a/src/UserGuide/V1.3.x/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/V1.3.x/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/V1.3.x/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/V1.3.x/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md index 34775eb04..56d3400a6 100644 --- a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md +++ b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md @@ -31,10 +31,10 @@ Based on the ability of user-defined functions, IoTDB provides a series of funct 1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3 and above |Please contact staff for assistance | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact staff for assistance| + | UDF installation package | Supported IoTDB versions | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3 and above | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster 3. In the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute the corresponding function registration statement as follows. diff --git a/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md index 5b2204c4a..9816bda03 100644 --- a/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -274,7 +274,7 @@ On each server, two yml files need to be written, namely confignnode. yml and da version: "3" services: iotdb-confignode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used + image: apache/iotdb:1.3.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -309,7 +309,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used + image: apache/iotdb:1.3.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 17fced6e9..000000000 --- a/src/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,682 +0,0 @@ - -# Monitoring Panel Deployment - -The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. - -The instructions for using the monitoring panel tool can be found in the [Instructions](../Tools-System/Monitor-Tool.md) section of the document. - -## Installation Preparation - -1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain -2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain - -## Installation Steps - -### Step 1: IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | Located in the configuration file | **Description** | -| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | -| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - -```Bash -./sbin/stop-standalone.sh #Stop confignode and datanode first -./sbin/start-confignode.sh -d #Start confignode -./sbin/start-datanode.sh -d #Start datanode -``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### Step 2: Install and configure Prometheus - -> Taking Prometheus installed on server 192.168.1.3 as an example. - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- -6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### Step 3: Install Grafana and configure the data source - -> Taking Grafana installed on server 192.168.1.3 as an example. - -1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) -2. Unzip and enter the corresponding folder - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. Start Grafana: - -```Shell -./bin/grafana-server web -``` - -4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. - -5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### Step 4: Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Select Prometheus as the data source and click Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: - -
- - - -
- -8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## Appendix, Detailed Explanation of Monitoring Indicators - -### System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Core:CPU cores -- CPU Load: - - System CPU Load:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited vm size: The size of virtual memory allocated by the operating system to running processes. - - Total physical memory:The total amount of available physical memory in the system. - - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total disk space:The maximum disk space that IoTDB can use. - - Used disk space:The disk space already used by IoTDB. -- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - all:All file quantities - - TsFile:Number of TsFiles - - seq:Number of sequential TsFiles - - unseq:Number of unsequence TsFiles - - wal:Number of WAL files - - cross-temp:Number of cross space merge temp files - - inner-seq-temp:Number of merged temp files in sequential space - - innser-unseq-temp:Number of merged temp files in unsequential space - - mods:Number of tombstone files -- Open File Count:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off Heap Memory:Out of heap memory usage. - - direct memory:Out of heap direct memory. - - mapped memory:Out of heap mapped memory. -- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute -- The Number of Class: - - loaded:The number of classes currently loaded by the JVM - - unloaded:The number of classes uninstalled by the JVM since system startup -- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Net Speed:The speed of network card sending and receiving data -- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart -- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) - -### Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Core:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster -- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster -- Total File Number:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBase: The total number of databases managed by the cluster (including replicas) -- Total DataRegion: The total number of DataRegions managed by the cluster -- Total SchemaRegion: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Core: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Timeseries: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Number: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections -- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 -- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node -- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes -- Task Number: The number of system tasks for each node -- Average Time Consumed of Task: The average time spent on various system tasks of a node -- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes -- Operation Per Second: The number of operations per second for a node -- Mainstream Process - - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process - - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node - - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process -- Schedule Stage - - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage - - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage - - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node - - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage - - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage - - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage - - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node - - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Load: CPU load of nodes -- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off Heap Memory: Non heap memory usage of nodes -- The Number Of Java Thread: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Log Number Per Minute: Different types of logs per minute for nodes - -### ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Current Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Current Status: The state of the SchemeRegion of the node -- System Memory: The system memory size of the node -- Swap Memory: Node's swap memory size -- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located -- DataNodes:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count: Number of borrowed clients in each thread pool of the node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node -- Client time situation - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Time: The time consumption of each stage of the node's Ratis -- Write Log Entry: The time required to write a log for the Ratis of a node -- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write QPS: Remote and local QPS written to node Ratis -- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol - -### DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- The Number Of Entity: Entity situation of node management -- Write Point Per Second: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - The Time Consumed Of Operation (avg): The average time spent on various operations of a node - - The Time Consumed Of Operation (50%): The median time spent on various operations of a node - - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes -- Thrift Statistics - - The QPS Of Interface: QPS of various Thrift interfaces of nodes - - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node - - Thrift Connection: The number of Thrfit connections of each type of node - - Thrift Active Thread: The number of active Thrift connections for each type of node -- Client Statistics - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count:Number of borrowed clients for each thread pool of a node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management - - TsFile Count In Each Level: Number of TsFile files at each level of node management - - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management -- Task Number: Number of Tasks for Nodes -- The Time Consumed of Task: The time consumption of tasks for nodes -- Compaction - - Compaction Read And Write Per Second: The merge read and write speed of nodes per second - - Compaction Number Per Minute: The number of merged nodes per minute - - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes - - Compacted Point Num Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Write Cost(avg): Average node write time, including writing wal and memtable -- Write Cost(50%): Median node write time, including writing wal and memtable -- Write Cost(99%): P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL File Num:Number of WAL files managed by nodes - - WAL Nodes Num: Number of WAL nodes managed by nodes - - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes - - WAL Serialize Total Cost: Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage - - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage - - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage - - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size Of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - The time consumed of query plan stages(avg): The average time spent on node queries at each stage - - The time consumed of query plan stages(50%): Median time spent on node queries at each stage - - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time -- Execution Plan Execution Time - - The time consumed of query execution stages(avg): The average execution time of node query execution plan - - The time consumed of query execution stages(50%):Median execution time of node query execution plan - - The time consumed of query execution stages(99%): P99 of node query execution plan execution time -- Operator Execution Time - - The time consumed of operator execution stages(avg): The average execution time of node query operators - - The time consumed of operator execution(50%): Median execution time of node query operator - - The time consumed of operator execution(99%): P99 of node query operator execution time -- Aggregation Query Computation Time - - The time consumed of query aggregation(avg): The average computation time for node aggregation queries - - The time consumed of query aggregation(50%): Median computation time for node aggregation queries - - The time consumed of query aggregation(99%): P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes - - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes - - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - The usage of query resource(avg): The average number of resource visits for node queries - - The usage of query resource(50%): Median number of resource visits for node queries - - The usage of query resource(99%): P99 for node query resource access quantity -- Data Transmission Time - - The time consumed of query data exchange(avg): The average time spent on node query data transmission - - The time consumed of query data exchange(50%): Median query data transmission time for nodes - - The time consumed of query data exchange(99%): P99 for node query data transmission time -- Number Of Data Transfers - - The count of Data Exchange(avg): The average number of data transfers queried by nodes - - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - The number of query queue: Node query task scheduling quantity - - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks - - The time consumed of query schedule time(50%): Median time spent on node query task scheduling - - The time consumed of query schedule time(99%): P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata - - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries - - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata -- Read Time Series - - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series - - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series - - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series -- Modify Time Series Metadata - - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata - - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes - - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists - - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list - - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata - - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries - - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata - - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata - - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks - - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks - - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader - - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries - - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries - - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle - - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle - - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle - - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle - - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event - - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event - - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event -- Get Data Block Tasks - - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks - - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks - - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md b/src/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md index 13bb431f3..8e673c5c6 100644 --- a/src/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md +++ b/src/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md @@ -43,12 +43,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | Version not required, please refer to [Deployment Guidance](../QuickStart/QuickStart_apache.md) | | JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | -| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact staff) | -| DataEase-IoTDB Connector | Please contact staff for assistance | +| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported) | +| DataEase-IoTDB Connector | Obtain the installation package | ## Installation Steps -Step 1: Please contact staff to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` +Step 1: Unzip the installation package `iotdb-api-source-1.0.0.zip` Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder diff --git a/src/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md index b5e580d9d..9d573b90e 100644 --- a/src/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md +++ b/src/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :---------------------------------------- | :----------------------------------------------------------- | | JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | | IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package.md) | -| ThingsBoard
(IoTDB adapted version) | Please contact commercial support to obtain the installation package. Detailed installation steps are provided below. | +| ThingsBoard
(IoTDB adapted version) | Obtain the installation package. Detailed installation steps are provided below. | ## Installation Steps Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the obtained installation package to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. - [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content ```Shell diff --git a/src/UserGuide/dev-1.3/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/dev-1.3/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/dev-1.3/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/dev-1.3/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md b/src/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md index 830b77c53..6ea9628e6 100644 --- a/src/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md +++ b/src/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md @@ -31,10 +31,10 @@ Based on the ability of user-defined functions, IoTDB provides a series of funct 1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3 and above |Please contact staff for assistance | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact staff for assistance| + | UDF installation package | Supported IoTDB versions | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3 and above | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster 3. In the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute the corresponding function registration statement as follows. diff --git a/src/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md b/src/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md index 820bf5e6f..375db1e7f 100644 --- a/src/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md +++ b/src/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md @@ -21,8 +21,6 @@ # Monitor Tool -The deployment of monitoring tools can refer to the document [Monitoring Panel Deployment](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) section. - ## Prometheus ### The mapping from metric type to prometheus format diff --git a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md index ac8e6c0da..b31477b4f 100644 --- a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -275,7 +275,7 @@ On each server, create two YML files: `confignode.yml` and `datanode.yml`. Examp version: "3" services: iotdb-confignode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -310,7 +310,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 3bafe066a..000000000 --- a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# Monitoring Panel Deployment - -The monitoring panel is one of the supporting tools for IoTDB. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring metrics, in order to help users monitor cluster health, optimize performance, and perform maintenance. This guide demonstrates how to enable the system monitoring module in a IoTDB instance and visualize monitoring metrics using Prometheus + Grafana, using a typical 3C3D cluster (3 ConfigNodes and 3 DataNodes) as an example. - -## 1. Installation Preparation - -1. Installing IoTDB: Install IoTDB V1.0 or above. Contact sales or technical support to obtain the installation package. - -2. Obtain the monitoring panel installation package: The monitoring panel is exclusive to the enterprise-grade IoTDB. Contact sales or technical support to obtain it. - -## 2. Installation Steps - -### 2.1 Enable Monitoring Metrics Collection in IoTDB - -1. Enable related configuration options. The configuration options related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to enable certain configuration options (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | **Configuration File** | **Description** | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-confignode.properties | Uncomment the configuration option and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-confignode.properties | Uncomment the configuration option and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-confignode.properties | Uncomment the configuration option and keep the default port `9091` or set another port (ensure no conflict) | -| dn_metric_reporter_list | conf/iotdb-datanode.properties | Uncomment the configuration option and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-datanode.properties | Uncomment the configuration option and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-datanode.properties | Uncomment the configuration option and keep the default port `9092` or set another port (ensure no conflict) | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | ---------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring configurations on all 3 nodes, restart the ConfigNode and DataNode services: - -```Bash - # Unix/OS X - ./sbin/stop-standalone.sh #Stop confignode and datanode first - ./sbin/start-confignode.sh -d #Start confignode - ./sbin/start-datanode.sh -d #Start datanode - - # Windows - # Before version V2.0.4.x - .\sbin\stop-standalone.bat - .\sbin\start-confignode.bat - .\sbin\start-datanode.bat - - # V2.0.4.x and later versions - .\sbin\windows\stop-standalone.bat - .\sbin\windows\start-confignode.bat - .\sbin\windows\start-datanode.bat - ``` - -3. After restarting, confirm the running status of each node through the client. If all nodes are running, the configuration is successful. - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 Install and Configure Prometheus - -> In this example, Prometheus is installed on server 192.168.1.3. - -1. Download Prometheus (version 2.30.3 or later). You can download it on Prometheus homepage (https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the folder: - -```Shell - tar xvfz prometheus-*.tar.gz - cd prometheus-* - ``` - -3. Modify the configuration. Modify the configuration file `prometheus.yml` as follows - - Add a confignode job to collect monitoring data for ConfigNode - - Add a datanode job to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell - ./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d - ``` - -5. Confirm successful startup. Open a browser and navigate to http://192.168.1.3:9090 . Navitage to "Status" -> "Targets". If the states of all targets were up, the configuration is successful. - -
- - -
- -6. Click the links in the `Targets` page to view monitoring information for the respective nodes. - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 Install Grafana and Configure the Data Source - -> n this example, Grafana is installed on server 192.168.1.3. - -1. Download Grafana (version 8.4.2 or later). You can download it on Grafana homepage (https://grafana.com/grafana/download) - -2. 2. Unzip the installation package and enter the folder: - -```Shell - tar -zxvf grafana-*.tar.gz - cd grafana-* - ``` - -3. Start Grafana: - -```Shell - ./bin/grafana-server web - ``` - -4. Log in to Grafana. Open a browser and navigate to `http://192.168.1.3:3000` (or the modified port). The default initial username and password are both `admin`. - -5. Configure data sources. Navigate to "Connections" -> "Data sources", add a new data source, and add`Prometheus`as data source. - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -Ensure the URL for Prometheus is correct. Click "Save & Test". If the message "Data source is working" appears, the configuration is successful. - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Choose one of the JSON files (e.g., `Apache IoTDB ConfigNode Dashboard`). - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Choose Prometheus as the data source and click "Import" - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. The imported `Apache IoTDB ConfigNode Dashboard` will now be displayed. - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, import other dashboards such as `Apache IoTDB DataNode Dashboard`, `Apache Performance Overview Dashboard`, and `Apache System Overview Dashboard`. - -
- - - -
- -8. The IoTDB monitoring panel is now fully imported, and you can view monitoring information at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. Appendix, Detailed Explanation of Monitoring Indicators - -### 3.1 System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Cores:CPU cores -- CPU Utilization: - - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. - - Total Physical Memory:The total amount of available physical memory in the system. - - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total Disk Space:The maximum disk space that IoTDB can use. - - Used Disk Space:The disk space already used by IoTDB. -- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - All:All file quantities - - TsFile:Number of TsFiles - - Seq:Number of sequential TsFiles - - Unseq:Number of unsequence TsFiles - - WAL:Number of WAL files - - Cross-Temp:Number of cross space merge temp files - - Inner-Seq-Temp:Number of merged temp files in sequential space - - Innsr-Unseq-Temp:Number of merged temp files in unsequential space - - Mods:Number of tombstone files -- Open File Handles:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off-Heap Memory:Out of heap memory usage. - - Direct Memory:Out of heap direct memory. - - Mapped Memory:Out of heap mapped memory. -- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute -- Loaded & Unloaded Classes: - - Loaded:The number of classes currently loaded by the JVM - - Unloaded:The number of classes uninstalled by the JVM since system startup -- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Network Speed:The speed of network card sending and receiving data -- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart -- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) - -### 3.2 Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Cores:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster -- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster -- Total Files:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBases: The total number of databases managed by the cluster (including replicas) -- Total DataRegions: The total number of DataRegions managed by the cluster -- Total SchemaRegions: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Cores: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Time Series: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Throughput: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Count: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections -- Operation Latency: The time consumption of various types of node operations, including average and P99 -- Average Interface Latency: The average time consumption of each thrust interface of a node -- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes -- Total Tasks: The number of system tasks for each node -- Average Task Latency: The average time spent on various system tasks of a node -- P99 Task Latency: P99 time consumption for various system tasks of nodes -- Operations Per Second: The number of operations per second for a node -- Mainstream Process - - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process - - Average Stage Latency: The average time consumption of each stage in the main process of a node - - P99 Stage Latency: P99 time consumption for each stage of the node's main process -- Schedule Stage - - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage - - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage - - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node - - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage - - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage - - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage - - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node - - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Utilization: CPU load of nodes -- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off-Heap Memory: Non heap memory usage of nodes -- Total Java Threads: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Logs Per Minute: Different types of logs per minute for nodes - -### 3.3 ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Status: The state of the SchemeRegion of the node -- System Memory Utilization: The system memory size of the node -- Swap Memory Utilization: Node's swap memory size -- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located -- DataNodes Status:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node -- Client time situation - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Latency: The time consumption of each stage of the node's Ratis -- Write Log Entry Latency: The time required to write a log for the Ratis of a node -- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write Throughput: Remote and local QPS written to node Ratis -- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol - -### 3.4 DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- Total Managed Entities: Entity situation of node management -- Write Throughput: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - Average Operation Latency: The average time spent on various operations of a node - - P50 Operation Latency: The median time spent on various operations of a node - - P99 Operation Latency: P99 time consumption for various operations of nodes -- Thrift Statistics - - Thrift Interface QPS: QPS of various Thrift interfaces of nodes - - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node - - Thrift Connections: The number of Thrfit connections of each type of node - - Active Thrift Threads: The number of active Thrift connections for each type of node -- Client Statistics - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - Total TsFile Size Per Level: The total size of TsFile files at each level of node management - - TsFile Count Per Level: Number of TsFile files at each level of node management - - Average TsFile Size Per Level: The average size of TsFile files at each level of node management -- Total Tasks: Number of Tasks for Nodes -- Task Latency: The time consumption of tasks for nodes -- Compaction - - Compaction Read/Write Throughput: The merge read and write speed of nodes per second - - Compactions Per Minute: The number of merged nodes per minute - - Compaction Chunk Status: The number of Chunks in different states merged by nodes - - Compacted-Points Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Average Write Latency: Average node write time, including writing wal and memtable -- P50 Write Latency: Median node write time, including writing wal and memtable -- P99 Write Latency: P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL Files:Number of WAL files managed by nodes - - WAL Nodes: Number of WAL nodes managed by nodes - - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes - - WAL Serialization Time (Total): Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage - - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage - - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage - - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - Average Query Plan Execution Time: The average time spent on node queries at each stage - - P50 Query Plan Execution Time: Median time spent on node queries at each stage - - P99 Query Plan Execution Time: P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution - - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution - - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time -- Execution Plan Execution Time - - Average Query Execution Time: The average execution time of node query execution plan - - P50 Query Execution Time:Median execution time of node query execution plan - - P99 Query Execution Time: P99 of node query execution plan execution time -- Operator Execution Time - - Average Query Operator Execution Time: The average execution time of node query operators - - P50 Query Operator Execution Time: Median execution time of node query operator - - P99 Query Operator Execution Time: P99 of node query operator execution time -- Aggregation Query Computation Time - - Average Query Aggregation Execution Time: The average computation time for node aggregation queries - - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries - - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes - - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes - - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - Average Query Resource Utilization: The average number of resource visits for node queries - - P50 Query Resource Utilization: Median number of resource visits for node queries - - P99 Query Resource Utilization: P99 for node query resource access quantity -- Data Transmission Time - - Average Query Data Exchange Latency: The average time spent on node query data transmission - - P50 Query Data Exchange Latency: Median query data transmission time for nodes - - P99 Query Data Exchange Latency: P99 for node query data transmission time -- Number Of Data Transfers - - Average Query Data Exchange Count: The average number of data transfers queried by nodes - - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - Query Queue Length: Node query task scheduling quantity - - Average Query Scheduling Latency: The average time spent on scheduling node query tasks - - P50 Query Scheduling Latency: Median time spent on node query task scheduling - - P99 Query Scheduling Latency: P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata - - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries - - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata -- Read Time Series - - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series - - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series - - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series -- Modify Time Series Metadata - - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata - - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes - - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists - - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list - - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata - - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries - - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata - - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata - - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries - - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries - - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - Average Chunk Read Time: The average time taken for node queries to read Chunks - - P50 Chunk Read Time: Median time spent querying nodes to read Chunks - - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries - - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries - - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader - - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries - - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader - - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries - - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle - - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle - - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle - - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle - - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle - - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle - - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event - - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event - - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event -- Get Data Block Tasks - - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks - - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks - - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler Count: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage: Node Ratis Memory Usage diff --git a/src/UserGuide/latest-Table/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/latest-Table/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/latest-Table/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/latest-Table/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/latest-Table/Reference/System-Tables_apache.md b/src/UserGuide/latest-Table/Reference/System-Tables_apache.md index 702a9c15a..1ffc07358 100644 --- a/src/UserGuide/latest-Table/Reference/System-Tables_apache.md +++ b/src/UserGuide/latest-Table/Reference/System-Tables_apache.md @@ -524,7 +524,6 @@ IoTDB> select * from information_schema.keywords limit 10 | internal\_port | INT32 | ATTRIBUTE | Internal port | | version | STRING | ATTRIBUTE | Version number | | build\_info | STRING | ATTRIBUTE | Commit ID | -| activate\_status (Enterprise Edition only) | STRING | ATTRIBUTE | Activation status | * Only administrators are allowed to perform operations on this table. * Query example: diff --git a/src/UserGuide/latest-Table/User-Manual/Load-Balance.md b/src/UserGuide/latest-Table/User-Manual/Load-Balance.md index ef69e1def..cfc42679f 100644 --- a/src/UserGuide/latest-Table/User-Manual/Load-Balance.md +++ b/src/UserGuide/latest-Table/User-Manual/Load-Balance.md @@ -211,7 +211,7 @@ Total line number = 4 It costs 0.110s ``` -7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. Otherwise, you will need to contact the support team to reapply for activation code information. +7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. #### 1.3.3 Manual Load Balancing (Optional) diff --git a/src/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md index 7ba937040..e3b6b59b4 100644 --- a/src/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -274,7 +274,7 @@ On each server, two yml files need to be written, namely confignnode. yml and da version: "3" services: iotdb-confignode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-confignode command: ["bash", "-c", "entrypoint.sh confignode"] @@ -309,7 +309,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #The image used + image: apache/iotdb:2.0.x-standalone #The image used hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 41c28734c..000000000 --- a/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# Monitoring Panel Deployment - -The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. - -The instructions for using the monitoring panel tool can be found in the [Instructions](../Tools-System/Monitor-Tool.md) section of the document. - -## 1. Installation Preparation - -1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain -2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain - -## 2. Installation Steps - -### 2.1 IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | Located in the configuration file | **Description** | -| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | -| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - -```Bash -# Unix/OS X -./sbin/stop-standalone.sh #Stop confignode and datanode first -./sbin/start-confignode.sh -d #Start confignode -./sbin/start-datanode.sh -d #Start datanode - -# Windows -# Before version V2.0.4.x -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x and later versions -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 Install and configure Prometheus - -> Taking Prometheus installed on server 192.168.1.3 as an example. - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- -6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 Install Grafana and configure the data source - -> Taking Grafana installed on server 192.168.1.3 as an example. - -1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) -2. Unzip and enter the corresponding folder - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. Start Grafana: - -```Shell -./bin/grafana-server web -``` - -4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. - -5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Select Prometheus as the data source and click Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: - -
- - - -
- -8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. Appendix, Detailed Explanation of Monitoring Indicators - -### 3.1 System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Cores:CPU cores -- CPU Utilization: - - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. - - Total Physical Memory:The total amount of available physical memory in the system. - - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total Disk Space:The maximum disk space that IoTDB can use. - - Used Disk Space:The disk space already used by IoTDB. -- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - All:All file quantities - - TsFile:Number of TsFiles - - Seq:Number of sequential TsFiles - - Unseq:Number of unsequence TsFiles - - WAL:Number of WAL files - - Cross-Temp:Number of cross space merge temp files - - Inner-Seq-Temp:Number of merged temp files in sequential space - - Innsr-Unseq-Temp:Number of merged temp files in unsequential space - - Mods:Number of tombstone files -- Open File Handles:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off-Heap Memory:Out of heap memory usage. - - Direct Memory:Out of heap direct memory. - - Mapped Memory:Out of heap mapped memory. -- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute -- Loaded & Unloaded Classes: - - Loaded:The number of classes currently loaded by the JVM - - Unloaded:The number of classes uninstalled by the JVM since system startup -- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Network Speed:The speed of network card sending and receiving data -- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart -- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) - -### 3.2 Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Cores:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster -- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster -- Total Files:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBases: The total number of databases managed by the cluster (including replicas) -- Total DataRegions: The total number of DataRegions managed by the cluster -- Total SchemaRegions: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Cores: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Time Series: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Throughput: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Count: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections -- Operation Latency: The time consumption of various types of node operations, including average and P99 -- Average Interface Latency: The average time consumption of each thrust interface of a node -- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes -- Total Tasks: The number of system tasks for each node -- Average Task Latency: The average time spent on various system tasks of a node -- P99 Task Latency: P99 time consumption for various system tasks of nodes -- Operations Per Second: The number of operations per second for a node -- Mainstream Process - - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process - - Average Stage Latency: The average time consumption of each stage in the main process of a node - - P99 Stage Latency: P99 time consumption for each stage of the node's main process -- Schedule Stage - - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage - - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage - - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node - - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage - - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage - - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage - - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node - - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Utilization: CPU load of nodes -- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off-Heap Memory: Non heap memory usage of nodes -- Total Java Threads: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Logs Per Minute: Different types of logs per minute for nodes - -### 3.3 ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Status: The state of the SchemeRegion of the node -- System Memory Utilization: The system memory size of the node -- Swap Memory Utilization: Node's swap memory size -- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located -- DataNodes Status:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node -- Client time situation - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Latency: The time consumption of each stage of the node's Ratis -- Write Log Entry Latency: The time required to write a log for the Ratis of a node -- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write Throughput: Remote and local QPS written to node Ratis -- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol - -### 3.4 DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- Total Managed Entities: Entity situation of node management -- Write Throughput: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - Average Operation Latency: The average time spent on various operations of a node - - P50 Operation Latency: The median time spent on various operations of a node - - P99 Operation Latency: P99 time consumption for various operations of nodes -- Thrift Statistics - - Thrift Interface QPS: QPS of various Thrift interfaces of nodes - - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node - - Thrift Connections: The number of Thrfit connections of each type of node - - Active Thrift Threads: The number of active Thrift connections for each type of node -- Client Statistics - - Active Clients: The number of active clients in each thread pool of a node - - Idle Clients: The number of idle clients in each thread pool of a node - - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node - - Created Clients Per Second: Number of created clients for each thread pool of the node - - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - - Average Client Active Time: The average active time of clients in each thread pool of a node - - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node - - Average Client Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - Total TsFile Size Per Level: The total size of TsFile files at each level of node management - - TsFile Count Per Level: Number of TsFile files at each level of node management - - Average TsFile Size Per Level: The average size of TsFile files at each level of node management -- Total Tasks: Number of Tasks for Nodes -- Task Latency: The time consumption of tasks for nodes -- Compaction - - Compaction Read/Write Throughput: The merge read and write speed of nodes per second - - Compactions Per Minute: The number of merged nodes per minute - - Compaction Chunk Status: The number of Chunks in different states merged by nodes - - Compacted-Points Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Average Write Latency: Average node write time, including writing wal and memtable -- P50 Write Latency: Median node write time, including writing wal and memtable -- P99 Write Latency: P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL Files:Number of WAL files managed by nodes - - WAL Nodes: Number of WAL nodes managed by nodes - - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes - - WAL Serialization Time (Total): Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage - - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage - - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage - - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - Average Query Plan Execution Time: The average time spent on node queries at each stage - - P50 Query Plan Execution Time: Median time spent on node queries at each stage - - P99 Query Plan Execution Time: P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution - - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution - - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time -- Execution Plan Execution Time - - Average Query Execution Time: The average execution time of node query execution plan - - P50 Query Execution Time:Median execution time of node query execution plan - - P99 Query Execution Time: P99 of node query execution plan execution time -- Operator Execution Time - - Average Query Operator Execution Time: The average execution time of node query operators - - P50 Query Operator Execution Time: Median execution time of node query operator - - P99 Query Operator Execution Time: P99 of node query operator execution time -- Aggregation Query Computation Time - - Average Query Aggregation Execution Time: The average computation time for node aggregation queries - - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries - - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes - - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes - - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - Average Query Resource Utilization: The average number of resource visits for node queries - - P50 Query Resource Utilization: Median number of resource visits for node queries - - P99 Query Resource Utilization: P99 for node query resource access quantity -- Data Transmission Time - - Average Query Data Exchange Latency: The average time spent on node query data transmission - - P50 Query Data Exchange Latency: Median query data transmission time for nodes - - P99 Query Data Exchange Latency: P99 for node query data transmission time -- Number Of Data Transfers - - Average Query Data Exchange Count: The average number of data transfers queried by nodes - - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - Query Queue Length: Node query task scheduling quantity - - Average Query Scheduling Latency: The average time spent on scheduling node query tasks - - P50 Query Scheduling Latency: Median time spent on node query task scheduling - - P99 Query Scheduling Latency: P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata - - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries - - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata -- Read Time Series - - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series - - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series - - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series -- Modify Time Series Metadata - - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata - - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes - - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists - - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list - - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata - - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries - - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata - - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata - - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries - - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries - - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - Average Chunk Read Time: The average time taken for node queries to read Chunks - - P50 Chunk Read Time: Median time spent querying nodes to read Chunks - - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries - - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries - - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader - - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries - - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader - - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries - - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle - - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle - - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle - - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle - - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle - - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle - - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event - - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event - - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event -- Get Data Block Tasks - - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks - - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks - - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler Count: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis -- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely -- RatisConsensus Memory Usage: Node Ratis Memory Usage diff --git a/src/UserGuide/latest/Ecosystem-Integration/DataEase.md b/src/UserGuide/latest/Ecosystem-Integration/DataEase.md index 9a22a1bf4..6a7ced920 100644 --- a/src/UserGuide/latest/Ecosystem-Integration/DataEase.md +++ b/src/UserGuide/latest/Ecosystem-Integration/DataEase.md @@ -43,12 +43,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | Version not required, please refer to [Deployment Guidance](../QuickStart/QuickStart_apache.md) | | JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | -| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact staff) | -| DataEase-IoTDB Connector | Please contact staff for assistance | +| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported) | +| DataEase-IoTDB Connector | Obtain the installation package | ## 3. Installation Steps -Step 1: Please contact staff to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` +Step 1: Unzip the installation package `iotdb-api-source-1.0.0.zip` Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder diff --git a/src/UserGuide/latest/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/latest/Ecosystem-Integration/Thingsboard.md index adb1f172e..5a2688cac 100644 --- a/src/UserGuide/latest/Ecosystem-Integration/Thingsboard.md +++ b/src/UserGuide/latest/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :---------------------------------------- | :----------------------------------------------------------- | | JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | | IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package.md) | -| ThingsBoard
(IoTDB adapted version) | Please contact commercial support to obtain the installation package. Detailed installation steps are provided below. | +| ThingsBoard
(IoTDB adapted version) | Obtain the installation package. Detailed installation steps are provided below. | ## 3. Installation Steps Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the obtained installation package to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. - [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content ```Shell diff --git a/src/UserGuide/latest/IoTDB-Introduction/Commercial-Support_apache.md b/src/UserGuide/latest/IoTDB-Introduction/Commercial-Support_apache.md index 349dcb064..976e153c5 100644 --- a/src/UserGuide/latest/IoTDB-Introduction/Commercial-Support_apache.md +++ b/src/UserGuide/latest/IoTDB-Introduction/Commercial-Support_apache.md @@ -33,7 +33,7 @@ The information provided here was provided by the entities named, and is not ver | | Name | Description | Contact Person(s) | Contact Email(s) | Contact Phone(s) | Involvement Level | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------- | ------------------ | ------------------- | -| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengchen Zheng | pengcheng.zheng@timecho.com | - | Committer | +| Timecho logo | [Timecho Europe GmbH](https://www.timecho-global.com/) | Enterprise-grade products and solutions, technical support/ consulting/ training, deployment and migration, performance tuning, training, custom development, protocol/ connector/ driver development, time-series foundation model services, and AI services. | Pengcheng Zheng | pengcheng.zheng@timecho.com | - | Committer | | pragmatic industries logo | [pragmatic industries GmbH](https://pragmaticindustries.com/)| Technical support/ consulting/ training, deployment and migration, custom development | Julian Feinauer | j.feinauer@pragmaticindustries.de | - | PMC Member | | ToddySoft logo | [ToddySoft GmbH](https://toddysoft.com/)| Technical support/ consulting/ training, deployment and migration, protocol/ connector/ driver development, custom development | Christofer Dutz | christofer.dutz@toddysoft.com | - | PMC Member | diff --git a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md index cca59fca3..e06da5fb0 100644 --- a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md +++ b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md @@ -29,10 +29,10 @@ Based on the ability of user-defined functions, IoTDB provides a series of funct 1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3 and above |Please contact staff for assistance | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact staff for assistance| + | UDF installation package | Supported IoTDB versions | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3 and above | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster 3. In the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute the corresponding function registration statement as follows. diff --git a/src/UserGuide/latest/User-Manual/Load-Balance.md b/src/UserGuide/latest/User-Manual/Load-Balance.md index c9d215c0a..355c06f2d 100644 --- a/src/UserGuide/latest/User-Manual/Load-Balance.md +++ b/src/UserGuide/latest/User-Manual/Load-Balance.md @@ -211,7 +211,7 @@ Total line number = 4 It costs 0.110s ``` -7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. Otherwise, you will need to contact the support team to reapply for activation code information. +7. Repeat the above steps for other nodes. It is important to note that for a new node to join the original cluster successfully, the original cluster must have sufficient allowance for additional DataNode nodes. #### 1.3.3 Manual Load Balancing (Optional) diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md index 55ee7dbee..0ddb744f2 100644 --- a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #使用的镜像 + image: apache/iotdb:2.0.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] @@ -426,4 +426,4 @@ docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf cd /docker-iotdb docker-compose -f confignode.yml up -d docker-compose -f datanode.yml up -d -``` \ No newline at end of file +``` diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md index 81b20598e..c54e4be1d 100644 --- a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 2.1 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 2.2 硬盘分区 @@ -206,4 +206,4 @@ ulimit -n } #添加JDK环境变量 source ~/.bashrc #配置环境生效 java -version #检查JDK环境 -``` \ No newline at end of file +``` diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 63264b5dd..000000000 --- a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -## 1. 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 2. 安装步骤 - -### 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -# Unix/OS X -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode - -# Windows -# V2.0.4.x 版本之前 -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x 版本及之后 -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. 附录、监控指标详解 - -### 3.1 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Cores:CPU 核数 -- CPU Utilization: - - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total Physical Memory:系统可用物理内存的总量。 - - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total Disk Space:IoTDB 可使用的最大磁盘空间。 - - Used Disk Space:IoTDB 已经使用的磁盘空间。 -- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - All:所有文件数量 - - TsFile:TsFile 数量 - - Seq:顺序 TsFile 数量 - - Unseq:乱序 TsFile 数量 - - WAL:WAL 文件数量 - - Cross-Temp:跨空间合并 temp 文件数量 - - Tnner-Seq-Temp:顺序空间内合并 temp 文件数量 - - Innser-Unseq-Temp:乱序空间内合并 temp 文件数量 - - Mods:墓碑文件数量 -- Open File Handles:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum Heap Memory:JVM 最大可用的堆内存大小。 - - Committed Heap Memory:JVM 已提交的堆内存大小。 - - Used Heap Memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off-Heap Memory:堆外内存用量。 - - Direct Memory:堆外直接内存。 - - Mapped Memory:堆外映射内存。 -- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 -- Loaded & Unloaded Classes: - - Loaded:JVM 目前已经加载的类的数量 - - Unloaded:系统启动至今 JVM 卸载的类的数量 -- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Network Speed:网卡发送和接收数据的速度 -- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 3.2 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Cores: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 -- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 -- Total Files: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBases: 集群管理的 Database 总数(含副本) -- Total DataRegions: 集群管理的 DataRegion 总数 -- Total SchemaRegions: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Cores: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Time Series: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Throughput: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Count: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 -- Average Interface Latency: 节点的各个 thrift 接口平均耗时 -- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 -- Total Tasks: 节点的各项系统任务数量 -- Average Task Latency: 节点的各项系统任务的平均耗时 -- P99 Task Latency: 节点的各项系统任务的 P99 耗时 -- Operations Per Second: 节点的每秒操作数 -- 主流程 - - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 - - Average Stage Latency: 节点主流程各阶段平均耗时 - - P99 Stage Latency: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 - - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 - - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 - - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 - - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 - - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 - - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 - - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Utilization: 节点的 CPU 负载 -- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off-Heap Memory: 节点的非堆内存使用情况 -- Total Java Threads: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Logs Per Minute: 节点的每分钟不同类型日志情况 - -### 3.3 ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Status: 节点的 SchemaRegion 的状态 -- System Memory Utilization: 节点的系统内存大小 -- Swap Memory Utilization: 节点的交换区内存大小 -- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes Status: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Average Client Active Time: 节点各线程池客户端的平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 -- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 -- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 - -### 3.4 DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- Total Managed Entities: 节点管理的实体情况 -- Write Throughput: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - Average Operation Latency: 节点的各项操作的平均耗时 - - P50 Operation Latency: 节点的各项操作耗时的中位数 - - P99 Operation Latency: 节点的各项操作耗时的P99 -- Thrift统计 - - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS - - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 - - Thrift Connections: 节点的各类型的 Thrfit 连接数量 - - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - - Average Client Active Time: 节点各线程池的客户端平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 - - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 -- Total Tasks: 节点的 Task 数量 -- Task Latency: 节点的 Task 的耗时 -- Compaction - - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 - - Compactions Per Minute: 节点的每分钟合并数量 - - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted-Points Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable -- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable -- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL Files: 节点管理的 WAL 文件数量 - - WAL Nodes: 节点管理的 WAL Node 数量 - - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 - - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 - - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 - - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 - - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 - - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 - - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 - - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 - - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 - - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 - - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 - - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 - - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - Average Query Resource Utilization: 节点查询资源访问数量的平均值 - - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 - - P99 Query Resource Utilization: 节点查询资源访问数量的P99 -- 数据传输耗时 - - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 - - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 - - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 -- 数据传输数量 - - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 - - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - Query Queue Length: 节点查询任务调度数量 - - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 - - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 - - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 - - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 - - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 - - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 - - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 - - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 - - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 - - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 - - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 - - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 - - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 - - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 - - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 - - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 - - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 - - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 - - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 - - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 - - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler Count: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 diff --git a/src/zh/UserGuide/Master/Table/Reference/System-Tables_apache.md b/src/zh/UserGuide/Master/Table/Reference/System-Tables_apache.md index ebcab5cda..158ef2c78 100644 --- a/src/zh/UserGuide/Master/Table/Reference/System-Tables_apache.md +++ b/src/zh/UserGuide/Master/Table/Reference/System-Tables_apache.md @@ -518,7 +518,6 @@ IoTDB> select * from information_schema.keywords limit 10 | internal\_port | INT32 | ATTRIBUTE | 内部端口 | | version | STRING | ATTRIBUTE | 版本号 | | build\_info | STRING | ATTRIBUTE | CommitID | -| activate\_status(仅企业版) | STRING | ATTRIBUTE | 激活状态 | * 仅管理员可执行操作 * 查询示例: diff --git a/src/zh/UserGuide/Master/Table/User-Manual/Load-Balance.md b/src/zh/UserGuide/Master/Table/User-Manual/Load-Balance.md index 69a934409..88e56ae20 100644 --- a/src/zh/UserGuide/Master/Table/User-Manual/Load-Balance.md +++ b/src/zh/UserGuide/Master/Table/User-Manual/Load-Balance.md @@ -207,7 +207,7 @@ Total line number = 3 It costs 0.110s ``` -7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的,否则需要联系工作人员重新申请激活码信息。 +7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的。 #### 1.3.3 手动负载均衡(按需选择) diff --git a/src/zh/UserGuide/Master/Table/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/Master/Table/User-Manual/Query-Performance-Analysis.md index 18d34f24a..43d24f16b 100644 --- a/src/zh/UserGuide/Master/Table/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/Master/Table/User-Manual/Query-Performance-Analysis.md @@ -28,7 +28,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------- | ------------------------------------------------------ | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ## 1. Explain 语句 diff --git a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md index 59851e111..b2c1a3347 100644 --- a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #使用的镜像 + image: apache/iotdb:2.0.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md index c96e0c3d8..986ef26fe 100644 --- a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 2.1 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 2.2 硬盘分区 diff --git a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 303e5b30c..000000000 --- a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,696 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -监控面板工具的使用说明可参考文档 [使用说明](../Tools-System/Monitor-Tool.md) 章节。 - -## 1. 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 2. 安装步骤 - -### 2.1 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -# Unix/OS X -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode - -# Windows -# V2.0.4.x 版本之前 -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x 版本及之后 -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. 附录、监控指标详解 - -### 3.1 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Cores:CPU 核数 -- CPU Utilization: - - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total Physical Memory:系统可用物理内存的总量。 - - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total Disk Space:IoTDB 可使用的最大磁盘空间。 - - Used Disk Space:IoTDB 已经使用的磁盘空间。 -- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - All:所有文件数量 - - TsFile:TsFile 数量 - - Seq:顺序 TsFile 数量 - - Unseq:乱序 TsFile 数量 - - WAL:WAL 文件数量 - - Cross-Temp:跨空间合并 temp 文件数量 - - Inner-Seq-Temp:顺序空间内合并 temp 文件数量 - - Innsr-Unseq-Temp:乱序空间内合并 temp 文件数量 - - Mods:墓碑文件数量 -- Open File Handles:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum Heap Memory:JVM 最大可用的堆内存大小。 - - Committed Heap Memory:JVM 已提交的堆内存大小。 - - Used Heap Memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off-Heap Memory:堆外内存用量。 - - Direct Memory:堆外直接内存。 - - Mapped Memory:堆外映射内存。 -- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 -- Loaded & Unloaded Classes: - - Loaded:JVM 目前已经加载的类的数量 - - Unloaded:系统启动至今 JVM 卸载的类的数量 -- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Network Speed:网卡发送和接收数据的速度 -- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 3.2 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Cores: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 -- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 -- Total Files: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBases: 集群管理的 Database 总数(含副本) -- Total DataRegions: 集群管理的 DataRegion 总数 -- Total SchemaRegions: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Cores: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Time Series: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Throughput: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Count: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 -- Average Interface Latency: 节点的各个 thrift 接口平均耗时 -- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 -- Total Tasks: 节点的各项系统任务数量 -- Average Task Latency: 节点的各项系统任务的平均耗时 -- P99 Task Latency: 节点的各项系统任务的 P99 耗时 -- Operations Per Second: 节点的每秒操作数 -- 主流程 - - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 - - Average Stage Latency: 节点主流程各阶段平均耗时 - - P99 Stage Latency: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 - - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 - - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 - - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 - - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 - - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 - - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 - - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Utilization: 节点的 CPU 负载 -- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off-Heap Memory: 节点的非堆内存使用情况 -- Total Java Threads: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Logs Per Minute: 节点的每分钟不同类型日志情况 - -### 3.3 ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Status: 节点的 SchemaRegion 的状态 -- System Memory Utilization: 节点的系统内存大小 -- Swap Memory Utilization: 节点的交换区内存大小 -- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes Status: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Average Client Active Time: 节点各线程池客户端的平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 -- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 -- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 - -### 3.4 DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- Total Managed Entities: 节点管理的实体情况 -- Write Throughput: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - Average Operation Latency: 节点的各项操作的平均耗时 - - P50 Operation Latency: 节点的各项操作耗时的中位数 - - P99 Operation Latency: 节点的各项操作耗时的P99 -- Thrift统计 - - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS - - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 - - Thrift Connections: 节点的各类型的 Thrfit 连接数量 - - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - - Average Client Active Time: 节点各线程池的客户端平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 - - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 -- Total Tasks: 节点的 Task 数量 -- Task Latency: 节点的 Task 的耗时 -- Compaction - - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 - - Compactions Per Minute: 节点的每分钟合并数量 - - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted-Points Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable -- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable -- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL Files: 节点管理的 WAL 文件数量 - - WAL Nodes: 节点管理的 WAL Node 数量 - - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 - - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 - - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 - - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 - - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 - - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 - - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 - - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 - - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 - - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 - - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 - - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 - - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - Average Query Resource Utilization: 节点查询资源访问数量的平均值 - - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 - - P99 Query Resource Utilization: 节点查询资源访问数量的P99 -- 数据传输耗时 - - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 - - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 - - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 -- 数据传输数量 - - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 - - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - Query Queue Length: 节点查询任务调度数量 - - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 - - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 - - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 - - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 - - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 - - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 - - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 - - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 - - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 - - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 - - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 - - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 - - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 - - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 - - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 - - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 - - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 - - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 - - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 - - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 - - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler Count: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 diff --git a/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md b/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md index c23599b95..b8e0fcfb8 100644 --- a/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md +++ b/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/DataEase.md @@ -44,12 +44,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | 版本无要求,安装请参考 IoTDB [部署指导](../QuickStart/QuickStart_apache.md) | | JDK | 建议 JDK11 及以上版本(推荐部署 JDK17 及以上版本) | -| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x,其他版本适配请联系工作人员) | -| DataEase-IoTDB 连接器 | 请联系工作人员获取 | +| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x) | +| DataEase-IoTDB 连接器 | 获取安装包 | ## 3. 安装步骤 -步骤一:请联系商务获取压缩包,解压缩安装包( iotdb-api-source-1.0.0.zip ) +步骤一:解压缩安装包( iotdb-api-source-1.0.0.zip ) 步骤二:解压后,修改`config`文件夹中的配置文件`application.properties` diff --git a/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md b/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md index d3bdd017a..e34e02362 100644 --- a/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md +++ b/src/zh/UserGuide/Master/Tree/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :-------------------------- | :----------------------------------------------------------- | | JDK | 要求已安装 17 及以上版本,具体下载请查看 [Oracle 官网](https://www.oracle.com/java/technologies/downloads/) | | IoTDB | 要求已安装 V1.3.0 及以上版本,具体安装过程请参考[ 部署指导](../QuickStart/QuickStart_apache.md) | -| ThingsBoard(IoTDB 适配版) | 安装包请联系商务获取,具体安装步骤参见下文 | +| ThingsBoard(IoTDB 适配版) | 获取安装包,具体安装步骤参见下文 | ## 3. 安装步骤 具体安装步骤请参考 [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)。其中: -- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用上方从商务获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) - [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 3 配置 ThingsBoard 数据库-ThingsBoard 配置】步骤中需要按照下方内容添加环境变量 ```Bash diff --git a/src/zh/UserGuide/Master/Tree/Reference/UDF-Libraries_apache.md b/src/zh/UserGuide/Master/Tree/Reference/UDF-Libraries_apache.md index dd0ccbbd5..95213efd1 100644 --- a/src/zh/UserGuide/Master/Tree/Reference/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/Master/Tree/Reference/UDF-Libraries_apache.md @@ -27,10 +27,10 @@ ## 1. 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取| + | UDF 安装包 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md b/src/zh/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md index cf49274bd..5d056b8a9 100644 --- a/src/zh/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/Master/Tree/SQL-Manual/UDF-Libraries_apache.md @@ -28,10 +28,10 @@ ## 1. 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | -------------------- | ------------- | --------- | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取 | + | UDF 安装包 | 支持的 IoTDB 版本 | + | -------------------- | ------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 4. 批量注册:两种注册方式:注册脚本 或 SQL汇总语句 diff --git a/src/zh/UserGuide/Master/Tree/User-Manual/Load-Balance.md b/src/zh/UserGuide/Master/Tree/User-Manual/Load-Balance.md index 0be76a388..60d10c7c7 100644 --- a/src/zh/UserGuide/Master/Tree/User-Manual/Load-Balance.md +++ b/src/zh/UserGuide/Master/Tree/User-Manual/Load-Balance.md @@ -207,7 +207,7 @@ Total line number = 3 It costs 0.110s ``` -7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的,否则需要联系工作人员重新申请激活码信息。 +7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的。 #### 1.3.3 手动负载均衡(按需选择) diff --git a/src/zh/UserGuide/Master/Tree/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/Master/Tree/User-Manual/Query-Performance-Analysis.md index 0debbeb6c..09d874795 100644 --- a/src/zh/UserGuide/Master/Tree/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/Master/Tree/User-Manual/Query-Performance-Analysis.md @@ -28,7 +28,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | :------------------ | :----------------------------------------------------------- | :--------------------------------------------------- | :----------------------------------------------------- | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ## 1. Explain 语句 diff --git a/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md b/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md deleted file mode 100644 index 67d324b64..000000000 --- a/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Monitoring-Board-Install-and-Deploy.md +++ /dev/null @@ -1,158 +0,0 @@ - - -# 监控面板安装部署 -从 Apache IoTDB 1.0 版本开始,我们引入了系统监控模块,可以完成对 Apache IoTDB 的重要运行指标进行监控,本文介绍了如何在 Apache IoTDB 分布式开启系统监控模块,并且使用 Prometheus + Grafana 的方式完成对系统监控指标的可视化。 - -## 前期准备 - -### 软件要求 - -1. Apache IoTDB:1.0 版本及以上,可以前往官网下载:https://iotdb.apache.org/Download/ -2. Prometheus:2.30.3 版本及以上,可以前往官网下载:https://prometheus.io/download/ -3. Grafana:8.4.2 版本及以上,可以前往官网下载:https://grafana.com/grafana/download -4. IoTDB-Grafana安装包:Grafana看板为 IoTDB的企业版工具,您可联系您的销售获取相关安装包 - -### 集群要求 - -进行以下操作前请确认IoTDB集群已启动。 - -### 说明 - -本文将在一台机器(1 个 ConfigNode 和 1 个 DataNode)环境上进行监控面板搭建,其他集群配置是类似的,用户可以根据自己的集群情况(ConfigNode 和 DataNode 的数量)进行配置调整。本文搭建的集群的基本配置信息如下表所示。 - -| 集群角色 | 节点IP | 监控模块推送器 | 监控模块级别 | 监控 Port | -| ---------- | --------- | -------------- | ------------ | --------- | -| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9091 | -| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9093 | - -## 配置 Prometheus 采集监控指标 - -1. 下载安装包。下载Prometheus的二进制包到本地,解压后进入对应文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -2. 修改配置。修改Prometheus的配置文件prometheus.yml如下 - a. 新增 confignode 任务收集 ConfigNode 的监控数据 - b. 新增 datanode 任务收集 DataNode 的监控数据 - -```YAML -global: - scrape_interval: 15s - -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["localhost:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["localhost:9093"] - honor_labels: true -``` - -3. 启动Promethues。Prometheus 监控数据的默认过期时间为 15d。在生产环境中,建议将其调整为 180d 以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -4. 确认启动成功。在浏览器中输入 http://localhost:9090,进入Prometheus,点击进入Status下的Target界面(如下图1),当看到State均为Up时表示配置成功并已经联通(如下图2),点击左侧链接可以跳转到网页监控。 - -![](/img/1a.png) -![](/img/2a.png) - - - -## 使用 Grafana 查看监控数据 - -### Step1:Grafana 安装、配置与启动 - -1. 下载Grafana的二进制包到本地,解压后进入对应文件夹: - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -2. 启动Grafana并进入: - -```Shell -./bin/grafana-server web -``` - -3. 在浏览器中输入 http://localhost:3000,进入Grafana,默认初始用户名和密码均为 admin。 -4. 首先我们在 Configuration 中配置 Data Source 为 Prometheus - -![](/img/3a.png) - -5. 在配置 Data Source 时注意 Prometheus 所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/4a.png) - -### Step2:使用IoTDB官方提供的Grafana看板 - -1. 进入 Grafana,选择 Dashboards 的 Browse - -![](/img/5a.png) - -2. 点击右侧 Import 按钮 - -![](/img/6a.png) - -3. 选择一种方式导入 Dashboard - a. 上传本地已下载的 Dashboard 的 Json 文件 - b. 输入 Grafana 官网获取到的 Dashboard 的 URL 或者 ID - c. 将 Dashboard 的 Json 文件内容直接粘贴进入 - -![](/img/7a.png) - -4. 选择 Dashboard 的 Prometheus 为刚刚配置好的 Data Source,然后点击 Import - -![](/img/8a.png) - -5. 之后进入 Dashboard,选择 job 为 ConfigNode,就看到如下的监控面板 - -![](/img/9a.png) - -6. 同样地,我们可以导入 Apache DataNode Dashboard,选择 job 为 DataNode,就看到如下的监控面板: - -![](/img/10a.png) - -### Step3:创建新的 Dashboard 进行数据可视化 - -1. 首先创建Dashboard,然后创建Panel - -![](/img/11a.png) - -2. 之后就可以在面板根据自己的需求对监控相关的数据进行可视化(所有相关的监控指标可以先在job中选择confignode/datanode筛选) - -![](/img/12a.png) - -3. 选择关注的监控指标可视化完成后,我们就得到了这样的面板: - -![](/img/13a.png) \ No newline at end of file diff --git a/src/zh/UserGuide/V1.2.x/Reference/UDF-Libraries.md b/src/zh/UserGuide/V1.2.x/Reference/UDF-Libraries.md index 45a72931f..8df08b94a 100644 --- a/src/zh/UserGuide/V1.2.x/Reference/UDF-Libraries.md +++ b/src/zh/UserGuide/V1.2.x/Reference/UDF-Libraries.md @@ -27,9 +27,9 @@ ## 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 函数库版本 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | UDF-1.2.x.zip | V1.0.0~V1.2.x | 请联系工作人员获取| + | UDF 函数库版本 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | UDF-1.2.x.zip | V1.0.0~V1.2.x | 2. 将获取的压缩包中的 `library-udf.jar` 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Database-Resources.md b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Database-Resources.md index 6395e1c05..fbcddaf77 100644 --- a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Database-Resources.md +++ b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Database-Resources.md @@ -72,11 +72,11 @@ 48核 1 2 - 请联系工作人员咨询 + 建议按需评估 1000w以上 - 请联系工作人员咨询 + 建议按需评估 @@ -134,11 +134,11 @@ 128G 1 2 - 请联系工作人员咨询 + 建议按需评估 1000w以上 - 请联系工作人员咨询 + 建议按需评估 diff --git a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md index 63f0a929d..bfcb4ff98 100644 --- a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + image: apache/iotdb:1.3.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Environment-Requirements.md index 9b91e2952..a0224de4e 100644 --- a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 硬盘分区 diff --git a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index b0bc2dcd4..000000000 --- a/src/zh/UserGuide/V1.3.x/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,684 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -监控面板工具的使用说明可参考文档 [使用说明](../Tools-System/Monitor-Tool.md) 章节。 - -## 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 安装步骤 - -### 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 附录、监控指标详解 - -### 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - all:所有文件数量 - - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Core: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Core: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 -- 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 - -### ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 - -### DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 -- Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 -- Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 -- 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 -- 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md b/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md index 9caa2bad2..f0b4c63ad 100644 --- a/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md +++ b/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/DataEase.md @@ -44,12 +44,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | 版本无要求,安装请参考 IoTDB [部署指导](../QuickStart/QuickStart_apache.md) | | JDK | 建议 JDK11 及以上版本(推荐部署 JDK17 及以上版本) | -| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x,其他版本适配请联系工作人员) | -| DataEase-IoTDB 连接器 | 请联系工作人员获取 | +| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x) | +| DataEase-IoTDB 连接器 | 获取安装包 | ## 安装步骤 -步骤一:请联系商务获取压缩包,解压缩安装包( iotdb-api-source-1.0.0.zip ) +步骤一:解压缩安装包( iotdb-api-source-1.0.0.zip ) 步骤二:解压后,修改`config`文件夹中的配置文件`application.properties` diff --git a/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md b/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md index 67bd9f7e3..0ef3b3109 100644 --- a/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md +++ b/src/zh/UserGuide/V1.3.x/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :-------------------------- | :----------------------------------------------------------- | | JDK | 要求已安装 17 及以上版本,具体下载请查看 [Oracle 官网](https://www.oracle.com/java/technologies/downloads/) | | IoTDB | 要求已安装 V1.3.0 及以上版本,具体安装过程请参考[ 部署指导](../QuickStart/QuickStart_apache.md) | -| ThingsBoard(IoTDB 适配版) | 安装包请联系商务获取,具体安装步骤参见下文 | +| ThingsBoard(IoTDB 适配版) | 获取安装包,具体安装步骤参见下文 | ## 安装步骤 具体安装步骤请参考 [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)。其中: -- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用上方从商务获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) - [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 3 配置 ThingsBoard 数据库-ThingsBoard 配置】步骤中需要按照下方内容添加环境变量 ```Bash diff --git a/src/zh/UserGuide/V1.3.x/Reference/UDF-Libraries_apache.md b/src/zh/UserGuide/V1.3.x/Reference/UDF-Libraries_apache.md index 927478823..5209747b4 100644 --- a/src/zh/UserGuide/V1.3.x/Reference/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/V1.3.x/Reference/UDF-Libraries_apache.md @@ -27,10 +27,10 @@ ## 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取| + | UDF 安装包 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md index c95babc1a..6eb2a012d 100644 --- a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md @@ -27,10 +27,10 @@ ## 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取| + | UDF 安装包 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/V1.3.x/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/V1.3.x/User-Manual/Query-Performance-Analysis.md index bd89214ef..19d29de93 100644 --- a/src/zh/UserGuide/V1.3.x/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/V1.3.x/User-Manual/Query-Performance-Analysis.md @@ -30,7 +30,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | :------------------ | :----------------------------------------------------------- | :--------------------------------------------------- | :----------------------------------------------------- | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ### Explain 语句 diff --git a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Database-Resources.md b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Database-Resources.md index 6395e1c05..fbcddaf77 100644 --- a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Database-Resources.md +++ b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Database-Resources.md @@ -72,11 +72,11 @@ 48核 1 2 - 请联系工作人员咨询 + 建议按需评估 1000w以上 - 请联系工作人员咨询 + 建议按需评估 @@ -134,11 +134,11 @@ 128G 1 2 - 请联系工作人员咨询 + 建议按需评估 1000w以上 - 请联系工作人员咨询 + 建议按需评估 diff --git a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md index 63f0a929d..bfcb4ff98 100644 --- a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + image: apache/iotdb:1.3.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Environment-Requirements.md index 9b91e2952..a0224de4e 100644 --- a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 硬盘分区 diff --git a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index b0bc2dcd4..000000000 --- a/src/zh/UserGuide/dev-1.3/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,684 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -监控面板工具的使用说明可参考文档 [使用说明](../Tools-System/Monitor-Tool.md) 章节。 - -## 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 安装步骤 - -### 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 附录、监控指标详解 - -### 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - all:所有文件数量 - - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Core: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Core: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 -- 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 - -### ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 - -### DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 -- Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 -- Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 -- 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 -- 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md b/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md index 9caa2bad2..f0b4c63ad 100644 --- a/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md +++ b/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/DataEase.md @@ -44,12 +44,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | 版本无要求,安装请参考 IoTDB [部署指导](../QuickStart/QuickStart_apache.md) | | JDK | 建议 JDK11 及以上版本(推荐部署 JDK17 及以上版本) | -| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x,其他版本适配请联系工作人员) | -| DataEase-IoTDB 连接器 | 请联系工作人员获取 | +| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x) | +| DataEase-IoTDB 连接器 | 获取安装包 | ## 安装步骤 -步骤一:请联系商务获取压缩包,解压缩安装包( iotdb-api-source-1.0.0.zip ) +步骤一:解压缩安装包( iotdb-api-source-1.0.0.zip ) 步骤二:解压后,修改`config`文件夹中的配置文件`application.properties` diff --git a/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md b/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md index 67bd9f7e3..0ef3b3109 100644 --- a/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md +++ b/src/zh/UserGuide/dev-1.3/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :-------------------------- | :----------------------------------------------------------- | | JDK | 要求已安装 17 及以上版本,具体下载请查看 [Oracle 官网](https://www.oracle.com/java/technologies/downloads/) | | IoTDB | 要求已安装 V1.3.0 及以上版本,具体安装过程请参考[ 部署指导](../QuickStart/QuickStart_apache.md) | -| ThingsBoard(IoTDB 适配版) | 安装包请联系商务获取,具体安装步骤参见下文 | +| ThingsBoard(IoTDB 适配版) | 获取安装包,具体安装步骤参见下文 | ## 安装步骤 具体安装步骤请参考 [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)。其中: -- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用上方从商务获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) - [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 3 配置 ThingsBoard 数据库-ThingsBoard 配置】步骤中需要按照下方内容添加环境变量 ```Bash diff --git a/src/zh/UserGuide/dev-1.3/Reference/UDF-Libraries_apache.md b/src/zh/UserGuide/dev-1.3/Reference/UDF-Libraries_apache.md index 927478823..5209747b4 100644 --- a/src/zh/UserGuide/dev-1.3/Reference/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/dev-1.3/Reference/UDF-Libraries_apache.md @@ -27,10 +27,10 @@ ## 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取| + | UDF 安装包 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md b/src/zh/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md index 6708ac887..0cb3b588b 100644 --- a/src/zh/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/dev-1.3/SQL-Manual/UDF-Libraries_apache.md @@ -27,10 +27,10 @@ ## 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取| + | UDF 安装包 | 支持的 IoTDB 版本 | + | --------------- | ----------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 diff --git a/src/zh/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md b/src/zh/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md index 4f0e850b2..32c18a07d 100644 --- a/src/zh/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md +++ b/src/zh/UserGuide/dev-1.3/Tools-System/Monitor-Tool_apache.md @@ -22,8 +22,6 @@ # Prometheus -监控工具的部署可参考文档 [监控面板部署](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) 章节。 - ## 监控指标的 Prometheus 映射关系 > 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,其中 value 为具体值 diff --git a/src/zh/UserGuide/dev-1.3/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/dev-1.3/User-Manual/Query-Performance-Analysis.md index bd89214ef..19d29de93 100644 --- a/src/zh/UserGuide/dev-1.3/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/dev-1.3/User-Manual/Query-Performance-Analysis.md @@ -30,7 +30,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | :------------------ | :----------------------------------------------------------- | :--------------------------------------------------- | :----------------------------------------------------- | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ### Explain 语句 diff --git a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md index 55ee7dbee..0ddb744f2 100644 --- a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #使用的镜像 + image: apache/iotdb:2.0.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] @@ -426,4 +426,4 @@ docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf cd /docker-iotdb docker-compose -f confignode.yml up -d docker-compose -f datanode.yml up -d -``` \ No newline at end of file +``` diff --git a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Environment-Requirements.md index 81b20598e..c54e4be1d 100644 --- a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 2.1 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 2.2 硬盘分区 @@ -206,4 +206,4 @@ ulimit -n } #添加JDK环境变量 source ~/.bashrc #配置环境生效 java -version #检查JDK环境 -``` \ No newline at end of file +``` diff --git a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 63264b5dd..000000000 --- a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,694 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -## 1. 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 2. 安装步骤 - -### 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -# Unix/OS X -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode - -# Windows -# V2.0.4.x 版本之前 -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x 版本及之后 -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. 附录、监控指标详解 - -### 3.1 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Cores:CPU 核数 -- CPU Utilization: - - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total Physical Memory:系统可用物理内存的总量。 - - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total Disk Space:IoTDB 可使用的最大磁盘空间。 - - Used Disk Space:IoTDB 已经使用的磁盘空间。 -- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - All:所有文件数量 - - TsFile:TsFile 数量 - - Seq:顺序 TsFile 数量 - - Unseq:乱序 TsFile 数量 - - WAL:WAL 文件数量 - - Cross-Temp:跨空间合并 temp 文件数量 - - Tnner-Seq-Temp:顺序空间内合并 temp 文件数量 - - Innser-Unseq-Temp:乱序空间内合并 temp 文件数量 - - Mods:墓碑文件数量 -- Open File Handles:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum Heap Memory:JVM 最大可用的堆内存大小。 - - Committed Heap Memory:JVM 已提交的堆内存大小。 - - Used Heap Memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off-Heap Memory:堆外内存用量。 - - Direct Memory:堆外直接内存。 - - Mapped Memory:堆外映射内存。 -- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 -- Loaded & Unloaded Classes: - - Loaded:JVM 目前已经加载的类的数量 - - Unloaded:系统启动至今 JVM 卸载的类的数量 -- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Network Speed:网卡发送和接收数据的速度 -- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 3.2 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Cores: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 -- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 -- Total Files: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBases: 集群管理的 Database 总数(含副本) -- Total DataRegions: 集群管理的 DataRegion 总数 -- Total SchemaRegions: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Cores: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Time Series: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Throughput: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Count: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 -- Average Interface Latency: 节点的各个 thrift 接口平均耗时 -- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 -- Total Tasks: 节点的各项系统任务数量 -- Average Task Latency: 节点的各项系统任务的平均耗时 -- P99 Task Latency: 节点的各项系统任务的 P99 耗时 -- Operations Per Second: 节点的每秒操作数 -- 主流程 - - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 - - Average Stage Latency: 节点主流程各阶段平均耗时 - - P99 Stage Latency: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 - - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 - - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 - - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 - - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 - - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 - - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 - - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Utilization: 节点的 CPU 负载 -- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off-Heap Memory: 节点的非堆内存使用情况 -- Total Java Threads: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Logs Per Minute: 节点的每分钟不同类型日志情况 - -### 3.3 ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Status: 节点的 SchemaRegion 的状态 -- System Memory Utilization: 节点的系统内存大小 -- Swap Memory Utilization: 节点的交换区内存大小 -- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes Status: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Average Client Active Time: 节点各线程池客户端的平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 -- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 -- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 - -### 3.4 DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- Total Managed Entities: 节点管理的实体情况 -- Write Throughput: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - Average Operation Latency: 节点的各项操作的平均耗时 - - P50 Operation Latency: 节点的各项操作耗时的中位数 - - P99 Operation Latency: 节点的各项操作耗时的P99 -- Thrift统计 - - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS - - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 - - Thrift Connections: 节点的各类型的 Thrfit 连接数量 - - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - - Average Client Active Time: 节点各线程池的客户端平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 - - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 -- Total Tasks: 节点的 Task 数量 -- Task Latency: 节点的 Task 的耗时 -- Compaction - - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 - - Compactions Per Minute: 节点的每分钟合并数量 - - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted-Points Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable -- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable -- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL Files: 节点管理的 WAL 文件数量 - - WAL Nodes: 节点管理的 WAL Node 数量 - - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 - - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 - - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 - - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 - - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 - - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 - - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 - - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 - - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 - - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 - - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 - - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 - - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - Average Query Resource Utilization: 节点查询资源访问数量的平均值 - - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 - - P99 Query Resource Utilization: 节点查询资源访问数量的P99 -- 数据传输耗时 - - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 - - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 - - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 -- 数据传输数量 - - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 - - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - Query Queue Length: 节点查询任务调度数量 - - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 - - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 - - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 - - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 - - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 - - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 - - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 - - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 - - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 - - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 - - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 - - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 - - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 - - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 - - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 - - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 - - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 - - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 - - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 - - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 - - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler Count: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 diff --git a/src/zh/UserGuide/latest-Table/Reference/System-Tables_apache.md b/src/zh/UserGuide/latest-Table/Reference/System-Tables_apache.md index ebcab5cda..158ef2c78 100644 --- a/src/zh/UserGuide/latest-Table/Reference/System-Tables_apache.md +++ b/src/zh/UserGuide/latest-Table/Reference/System-Tables_apache.md @@ -518,7 +518,6 @@ IoTDB> select * from information_schema.keywords limit 10 | internal\_port | INT32 | ATTRIBUTE | 内部端口 | | version | STRING | ATTRIBUTE | 版本号 | | build\_info | STRING | ATTRIBUTE | CommitID | -| activate\_status(仅企业版) | STRING | ATTRIBUTE | 激活状态 | * 仅管理员可执行操作 * 查询示例: diff --git a/src/zh/UserGuide/latest-Table/User-Manual/Load-Balance.md b/src/zh/UserGuide/latest-Table/User-Manual/Load-Balance.md index 69a934409..88e56ae20 100644 --- a/src/zh/UserGuide/latest-Table/User-Manual/Load-Balance.md +++ b/src/zh/UserGuide/latest-Table/User-Manual/Load-Balance.md @@ -207,7 +207,7 @@ Total line number = 3 It costs 0.110s ``` -7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的,否则需要联系工作人员重新申请激活码信息。 +7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的。 #### 1.3.3 手动负载均衡(按需选择) diff --git a/src/zh/UserGuide/latest-Table/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/latest-Table/User-Manual/Query-Performance-Analysis.md index c5850192f..aa7a4fd96 100644 --- a/src/zh/UserGuide/latest-Table/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/latest-Table/User-Manual/Query-Performance-Analysis.md @@ -28,7 +28,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------- | ------------------------------------------------------ | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ## 1. Explain 语句 diff --git a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md index 59851e111..b2c1a3347 100644 --- a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -307,7 +307,7 @@ services: version: "3" services: iotdb-datanode: - image: iotdb-enterprise:2.0.x-standalone #使用的镜像 + image: apache/iotdb:2.0.x-standalone #使用的镜像 hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 container_name: iotdb-datanode command: ["bash", "-c", "entrypoint.sh datanode"] diff --git a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.md index c96e0c3d8..986ef26fe 100644 --- a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.md @@ -81,7 +81,7 @@ IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵 ### 2.1 版本要求 -IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 +IoTDB 支持 Linux、Windows、MacOS 等操作系统及常见 CPU 型号,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 ### 2.2 硬盘分区 diff --git a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 303e5b30c..000000000 --- a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,696 +0,0 @@ - -# 监控面板部署 - -IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 - -监控面板工具的使用说明可参考文档 [使用说明](../Tools-System/Monitor-Tool.md) 章节。 - -## 1. 安装准备 - -1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 -2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 - -## 2. 安装步骤 - -### 2.1 步骤一:IoTDB开启监控指标采集 - -1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 - -| 配置项 | 所在配置文件 | 配置说明 | -| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | -| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | -| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | - -以3C3D集群为例,需要修改的监控配置如下: - -| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | -| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: - -```shell -# Unix/OS X -./sbin/stop-standalone.sh #先停止confignode和datanode -./sbin/start-confignode.sh -d #启动confignode -./sbin/start-datanode.sh -d #启动datanode - -# Windows -# V2.0.4.x 版本之前 -.\sbin\stop-standalone.bat -.\sbin\start-confignode.bat -.\sbin\start-datanode.bat - -# V2.0.4.x 版本及之后 -.\sbin\windows\stop-standalone.bat -.\sbin\windows\start-confignode.bat -.\sbin\windows\start-datanode.bat -``` - -3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### 2.2 步骤二:安装、配置Prometheus - -> 此处以prometheus安装在服务器192.168.1.3为例。 - -1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) -2. 解压安装包,进入解压后的文件夹: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. 修改配置。修改配置文件prometheus.yml如下 - 1. 新增confignode任务收集ConfigNode的监控数据 - 2. 新增datanode任务收集DataNode的监控数据 - -```shell -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 - -
- - -
- - - -6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### 2.3 步骤三:安装grafana并配置数据源 - -> 此处以Grafana安装在服务器192.168.1.3为例。 - -1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) -2. 解压并进入对应文件夹 - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. 启动Grafana: - -```Shell -./bin/grafana-server web -``` - -4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 - -5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### 2.4 步骤四:导入IoTDB Grafana看板 - -1. 进入Grafana,选择Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. 点击右侧 Import 按钮 - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. 使用upload json file的方式导入Dashboard - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. 选择数据源为Prometheus,然后点击Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: - -
- - - -
- -8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## 3. 附录、监控指标详解 - -### 3.1 系统面板(System Dashboard) - -该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 - -#### CPU - -- CPU Cores:CPU 核数 -- CPU Utilization: - - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 -- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 - -#### Memory - -- System Memory:当前系统内存的使用情况。 - - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total Physical Memory:系统可用物理内存的总量。 - - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 -- System Swap Memory:交换空间(Swap Space)内存用量。 -- Process Memory:IoTDB 进程使用内存的情况。 - - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - - Used Memory:IoTDB 进程当前已经使用的内存总量。 - -#### Disk - -- Disk Space: - - Total Disk Space:IoTDB 可使用的最大磁盘空间。 - - Used Disk Space:IoTDB 已经使用的磁盘空间。 -- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 -- File Count:IoTDB 相关文件数量 - - All:所有文件数量 - - TsFile:TsFile 数量 - - Seq:顺序 TsFile 数量 - - Unseq:乱序 TsFile 数量 - - WAL:WAL 文件数量 - - Cross-Temp:跨空间合并 temp 文件数量 - - Inner-Seq-Temp:顺序空间内合并 temp 文件数量 - - Innsr-Unseq-Temp:乱序空间内合并 temp 文件数量 - - Mods:墓碑文件数量 -- Open File Handles:系统打开的文件句柄数量 -- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 -- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 -- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 - -#### JVM - -- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 -- Heap Memory:JVM 堆内存使用情况。 - - Maximum Heap Memory:JVM 最大可用的堆内存大小。 - - Committed Heap Memory:JVM 已提交的堆内存大小。 - - Used Heap Memory:JVM 已经使用的堆内存大小。 - - PS Eden Space:PS Young 区的大小。 - - PS Old Space:PS Old 区的大小。 - - PS Survivor Space:PS Survivor 区的大小。 - - ...(CMS/G1/ZGC 等) -- Off-Heap Memory:堆外内存用量。 - - Direct Memory:堆外直接内存。 - - Mapped Memory:堆外映射内存。 -- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 -- Loaded & Unloaded Classes: - - Loaded:JVM 目前已经加载的类的数量 - - Unloaded:系统启动至今 JVM 卸载的类的数量 -- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### Network - -eno 指的是到公网的网卡,lo 是虚拟网卡。 - -- Network Speed:网卡发送和接收数据的速度 -- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) - -### 3.2 整体性能面板(Performance Overview Dashboard) - -#### Cluster Overview - -- Total CPU Cores: 集群机器 CPU 总核数 -- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 -- 磁盘 - - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 -- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 -- Up Time: 集群启动至今的时长 -- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 -- 内存 - - Total System Memory: 集群机器系统内存总大小 - - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 -- Total Files: 集群管理文件总数量 -- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBases: 集群管理的 Database 总数(含副本) -- Total DataRegions: 集群管理的 DataRegion 总数 -- Total SchemaRegions: 集群管理的 SchemaRegion 总数 - -#### Node Overview - -- CPU Cores: 节点所在机器的 CPU 核数 -- Disk Space: 节点所在机器的磁盘大小 -- Time Series: 节点所在机器管理的时间序列数量(含副本) -- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Throughput: 节点所在机器的每秒写入速度(含副本) -- System Memory: 节点所在机器的系统内存大小 -- Swap Memory: 节点所在机器的交换内存大小 -- File Count: 节点管理的文件数 - -#### Performance - -- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 -- Average Interface Latency: 节点的各个 thrift 接口平均耗时 -- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 -- Total Tasks: 节点的各项系统任务数量 -- Average Task Latency: 节点的各项系统任务的平均耗时 -- P99 Task Latency: 节点的各项系统任务的 P99 耗时 -- Operations Per Second: 节点的每秒操作数 -- 主流程 - - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 - - Average Stage Latency: 节点主流程各阶段平均耗时 - - P99 Stage Latency: 节点主流程各阶段 P99 耗时 -- Schedule 阶段 - - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 - - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 - - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 -- Local Schedule 各子阶段 - - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 - - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 -- Storage 阶段 - - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 - - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 - - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 -- Engine 阶段 - - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 - - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 - - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 - -#### System - -- CPU Utilization: 节点的 CPU 负载 -- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC -- Heap Memory: 节点的堆内存使用情况 -- Off-Heap Memory: 节点的非堆内存使用情况 -- Total Java Threads: 节点的 Java 线程数量情况 -- File Count: 节点管理的文件数量情况 -- File Size: 节点管理文件大小情况 -- Logs Per Minute: 节点的每分钟不同类型日志情况 - -### 3.3 ConfigNode 面板(ConfigNode Dashboard) - -该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 - -#### Node Overview - -- Database Count: 节点的数据库数量 -- Region - - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Status: 节点的 DataRegion 的状态 - - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Status: 节点的 SchemaRegion 的状态 -- System Memory Utilization: 节点的系统内存大小 -- Swap Memory Utilization: 节点的交换区内存大小 -- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes Status: 节点所在集群的 DataNode 情况 -- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 - -#### NodeInfo - -- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode -- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 -- DataNode Status: 节点所在集群的 DataNode 节点的状态 -- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 -- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 -- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 -- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 - -#### Protocol - -- 客户端数量统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 -- 客户端时间情况 - - Average Client Active Time: 节点各线程池客户端的平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Partition Table - -- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 -- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 -- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 -- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 -- DataRegion Status: 节点所在集群的 DataRegion 状态 -- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 - -#### Consensus - -- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 -- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 -- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 - -### 3.4 DataNode 面板(DataNode Dashboard) - -该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 - -#### Node Overview - -- Total Managed Entities: 节点管理的实体情况 -- Write Throughput: 节点的每秒写入速度 -- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 - -#### Protocol - -- 节点操作耗时 - - Average Operation Latency: 节点的各项操作的平均耗时 - - P50 Operation Latency: 节点的各项操作耗时的中位数 - - P99 Operation Latency: 节点的各项操作耗时的P99 -- Thrift统计 - - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS - - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 - - Thrift Connections: 节点的各类型的 Thrfit 连接数量 - - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 -- 客户端统计 - - Active Clients: 节点各线程池的活跃客户端数量 - - Idle Clients: 节点各线程池的空闲客户端数量 - - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 - - Created Clients Per Second: 节点各线程池的创建客户端数量 - - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - - Average Client Active Time: 节点各线程池的客户端平均活跃时间 - - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 - - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 - -#### Storage Engine - -- File Count: 节点管理的各类型文件数量 -- File Size: 节点管理的各类型文件大小 -- TsFile - - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 - - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 -- Total Tasks: 节点的 Task 数量 -- Task Latency: 节点的 Task 的耗时 -- Compaction - - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 - - Compactions Per Minute: 节点的每分钟合并数量 - - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted-Points Per Minute: 节点每分钟合并的点数 - -#### Write Performance - -- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable -- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable -- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable -- WAL - - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL Files: 节点管理的 WAL 文件数量 - - WAL Nodes: 节点管理的 WAL Node 数量 - - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - - WAL Buffer - - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 -- Flush统计 - - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 -- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 -- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 - -#### Schema Engine - -- Schema Engine Mode: 节点的元数据引擎模式 -- Schema Consensus Protocol: 节点的元数据共识协议 -- Schema Region Number: 节点管理的 SchemaRegion 数量 -- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 -- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 -- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 -- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) -- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 -- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 -- 时间序列统计 - - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 - - Series Type: 节点不同类型的时间序列数量 - - Time Series Number: 节点的时间序列总数 - - Template Series Number: 节点的模板时间序列总数 - - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 -- IMNode统计 - - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 - - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 - - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 - - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 - - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 - - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 -- Cache Hit Rate: 节点的缓存命中率 -- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 -- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 -- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 - -#### Query Engine - -- 各阶段耗时 - - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 - - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 - - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 -- 执行计划分发耗时 - - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 - - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 - - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 -- 执行计划执行耗时 - - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 - - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 - - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 -- 算子执行耗时 - - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 - - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 - - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 -- 聚合查询计算耗时 - - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 - - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 - - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 -- 文件/内存接口耗时 - - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 - - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 - - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 -- 资源访问数量 - - Average Query Resource Utilization: 节点查询资源访问数量的平均值 - - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 - - P99 Query Resource Utilization: 节点查询资源访问数量的P99 -- 数据传输耗时 - - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 - - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 - - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 -- 数据传输数量 - - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 - - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 -- 任务调度数量与耗时 - - Query Queue Length: 节点查询任务调度数量 - - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 - - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 - - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 - -#### Query Interface - -- 加载时间序列元数据 - - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 -- 读取时间序列 - - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 - - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 - - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 -- 修改时间序列元数据 - - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 - - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 - - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 -- 加载Chunk元数据列表 - - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 - - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 - - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 -- 修改Chunk元数据 - - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 - - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 - - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 -- 按照Chunk元数据过滤 - - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 - - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 - - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 -- 构造Chunk Reader - - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 - - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 - - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 -- 读取Chunk - - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 - - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 - - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 -- 初始化Chunk Reader - - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 - - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 - - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 -- 查询通过 Merge Reader 构造 TsBlock - - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 - -#### Query Data Exchange - -查询的数据交换耗时。 - -- 通过 source handle 获取 TsBlock - - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 -- 通过 source handle 反序列化 TsBlock - - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 -- 回调 data block event - - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 - - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 - - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 - - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 - - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 - -#### Query Related Resource - -- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 -- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 -- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 -- Coordinator: 节点上记录的查询数量 -- MemoryPool Size: 节点查询相关的内存池情况 -- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler Count: 节点查询相关的队列任务数量 - -#### Consensus - IoT Consensus - -- 内存使用 - - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 -- 节点间同步情况 - - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 -- 不同执行阶段耗时 - - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 - -#### Consensus - SchemaRegion Ratis Consensus - -- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 -- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 diff --git a/src/zh/UserGuide/latest/Ecosystem-Integration/DataEase.md b/src/zh/UserGuide/latest/Ecosystem-Integration/DataEase.md index c23599b95..b8e0fcfb8 100644 --- a/src/zh/UserGuide/latest/Ecosystem-Integration/DataEase.md +++ b/src/zh/UserGuide/latest/Ecosystem-Integration/DataEase.md @@ -44,12 +44,12 @@ | :-------------------- | :----------------------------------------------------------- | | IoTDB | 版本无要求,安装请参考 IoTDB [部署指导](../QuickStart/QuickStart_apache.md) | | JDK | 建议 JDK11 及以上版本(推荐部署 JDK17 及以上版本) | -| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x,其他版本适配请联系工作人员) | -| DataEase-IoTDB 连接器 | 请联系工作人员获取 | +| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x) | +| DataEase-IoTDB 连接器 | 获取安装包 | ## 3. 安装步骤 -步骤一:请联系商务获取压缩包,解压缩安装包( iotdb-api-source-1.0.0.zip ) +步骤一:解压缩安装包( iotdb-api-source-1.0.0.zip ) 步骤二:解压后,修改`config`文件夹中的配置文件`application.properties` diff --git a/src/zh/UserGuide/latest/Ecosystem-Integration/Thingsboard.md b/src/zh/UserGuide/latest/Ecosystem-Integration/Thingsboard.md index d3bdd017a..e34e02362 100644 --- a/src/zh/UserGuide/latest/Ecosystem-Integration/Thingsboard.md +++ b/src/zh/UserGuide/latest/Ecosystem-Integration/Thingsboard.md @@ -42,13 +42,13 @@ | :-------------------------- | :----------------------------------------------------------- | | JDK | 要求已安装 17 及以上版本,具体下载请查看 [Oracle 官网](https://www.oracle.com/java/technologies/downloads/) | | IoTDB | 要求已安装 V1.3.0 及以上版本,具体安装过程请参考[ 部署指导](../QuickStart/QuickStart_apache.md) | -| ThingsBoard(IoTDB 适配版) | 安装包请联系商务获取,具体安装步骤参见下文 | +| ThingsBoard(IoTDB 适配版) | 获取安装包,具体安装步骤参见下文 | ## 3. 安装步骤 具体安装步骤请参考 [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)。其中: -- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用上方从商务获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) - [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 3 配置 ThingsBoard 数据库-ThingsBoard 配置】步骤中需要按照下方内容添加环境变量 ```Bash diff --git a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md index bea5273a0..2f8bcb7d4 100644 --- a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md +++ b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md @@ -28,10 +28,10 @@ ## 1. 安装步骤 1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 - | UDF 安装包 | 支持的 IoTDB 版本 | 下载链接 | - | -------------------- | ------------- | --------- | - | apache-UDF-1.3.3.zip | V1.3.3及以上 | 请联系商业支持获取 | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系商业支持获取 | + | UDF 安装包 | 支持的 IoTDB 版本 | + | -------------------- | ------------- | + | apache-UDF-1.3.3.zip | V1.3.3及以上 | + | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | 2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 3. 在 IoTDB 的 SQL 命令行终端(CLI)的 SQL 操作界面中,执行下述相应的函数注册语句。 4. 批量注册:两种注册方式:注册脚本 或 SQL汇总语句 diff --git a/src/zh/UserGuide/latest/User-Manual/Load-Balance.md b/src/zh/UserGuide/latest/User-Manual/Load-Balance.md index 0be76a388..60d10c7c7 100644 --- a/src/zh/UserGuide/latest/User-Manual/Load-Balance.md +++ b/src/zh/UserGuide/latest/User-Manual/Load-Balance.md @@ -207,7 +207,7 @@ Total line number = 3 It costs 0.110s ``` -7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的,否则需要联系工作人员重新申请激活码信息。 +7. 其它节点重复以上操作,值得注意的是,新节点能够成功加入原集群需保证原集群允许加入的DataNode节点数量是足够的。 #### 1.3.3 手动负载均衡(按需选择) diff --git a/src/zh/UserGuide/latest/User-Manual/Query-Performance-Analysis.md b/src/zh/UserGuide/latest/User-Manual/Query-Performance-Analysis.md index 0debbeb6c..09d874795 100644 --- a/src/zh/UserGuide/latest/User-Manual/Query-Performance-Analysis.md +++ b/src/zh/UserGuide/latest/User-Manual/Query-Performance-Analysis.md @@ -28,7 +28,7 @@ | 方法 | 安装难度 | 业务影响 | 功能范围 | | :------------------ | :----------------------------------------------------------- | :--------------------------------------------------- | :----------------------------------------------------- | | Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | -| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| 监控面板 | 中。需要安装IoTDB监控面板工具,并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | | Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | ## 1. Explain 语句