Deepgreen & Greenplum 高可用(一) - Segment节点故障转移

    xiaoxiao2023-08-28  150

    尚书中云:惟事事,乃其有备,有备无患。这教导我们做事一定要有准备,做事尚且如此,在企事业单位发展中处于基础地位的数据仓库软件在运行过程中,何尝不需要有备无患呢? 今天别的不表,主要来谈谈企业级数据仓库软件Deepgreen和Greenplum的高可用特性之一:计算节点镜像。 一、首先从理论上来讲,正常Segment节点和他的Mirror是分布在不同主机上的,以防止单点故障导致的数据库访问异常。当正常Segment节点出现故障时,Mirror节点可以自动接管Segment节点的服务,数据库仍然可以正常使用。这个过程对前台应用来说是透明的。 下面我们来看实操例子: 1.测试环境 本人笔记本电脑; 4核心,8G内存,1T硬盘 已安装1个主节点和2个计算节点,无镜像 2.首先查看集群状态:1 Master,2 Segments ## 执行命令:gpstate 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:--------------------------- 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master instance parameters 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:--------------------------- 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Database = template1 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master Port = 15432 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master directory = /dgdata/master/dg-1 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Timeout = 600 seconds 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master standby = Off 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:--------------------------------------- 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Segment instances that will be started 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:--------------------------------------- 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- Host Datadir Port 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg0 25432 20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg1 25433 3.创建镜像节点目录并添加节点 ## 创建mirror目录 dgadmin@flash:~$ mkdir /dgdata/mirror ## 执行命令:gpaddmirrors添加节点 dgadmin@flash:~$ gpaddmirrors 20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting gpaddmirrors with args: 20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB' 20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19' 20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Obtaining Segment details from master... Enter mirror segment data directory location 1 of 2 > /dgdata/mirror Enter mirror segment data directory location 2 of 2 > /dgdata/mirror 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum Add Mirrors Parameters 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master data directory = /dgdata/master/dg-1 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master port = 15432 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Parallel batch limit = 16 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 1 of 2 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg0 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25432 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28432 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg0 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26432 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27432 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 2 of 2 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg1 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25433 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28433 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg1 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26433 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27433 20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:---------------------------------------------------------- Continue with add mirrors procedure Yy|Nn (default=N): > y 20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-2 segment(s) to add 20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Building template directory 20170628:06:42:07:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Validating remote directories ...... 20170628:06:42:13:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Copying template directory file . 20170628:06:42:14:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Configuring new segments ...... 20170628:06:42:20:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Cleaning files . 20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg0:content=0:dbid=4:mode=r:status=u 20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg1:content=1:dbid=5:mode=r:status=u 20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration with new mirrors 20170628:06:42:22:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating mirrors ...... 20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting mirrors 20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ....... 20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results... 20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up 20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating primaries 20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait... ...... 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results... 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Done updating primaries 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-****************************************************************** 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror segments have been added; data synchronization is in progress. 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Data synchronization will continue in the background. 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:- 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress. 20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-***************************************************************** 4.重新查看集群状态:确定2个Segments都已经添加好Mirror dgadmin@flash:~$ gpstate 20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args: 20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB' 20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19' 20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master... 20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments... .... 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master instance = Active 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 2 20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- dgadmin@flash:~$ 5.登录数据库进行查询测试 dgadmin@flash:~$ psql -d postgres psql (8.2.15) Type "help" for help. postgres=# \c You are now connected to database "postgres" as user "dgadmin". postgres=# \l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+--------------------- postgres | dgadmin | UTF8 | template0 | dgadmin | UTF8 | =c/dgadmin : dgadmin=CTc/dgadmin template1 | dgadmin | UTF8 | =c/dgadmin : dgadmin=CTc/dgadmin tpch | dgadmin | UTF8 | tpch_1g | dgadmin | UTF8 | (5 rows) postgres=# \c tpch You are now connected to database "tpch" as user "dgadmin". tpch=# \dt List of relations Schema | Name | Type | Owner | Storage --------+----------+-------+---------+---------------------- public | customer | table | dgadmin | append only columnar public | lineitem | table | dgadmin | append only columnar public | nation | table | dgadmin | append only columnar public | orders | table | dgadmin | append only columnar public | part | table | dgadmin | append only columnar public | partsupp | table | dgadmin | append only columnar public | region | table | dgadmin | append only columnar public | supplier | table | dgadmin | append only columnar (8 rows) tpch=# select * from region; r_regionkey | r_name | r_comment -------------+---------------------------+--------------------------------------------------------------------------------------------------------------------- 1 | AMERICA | hs use ironic, even requests. s 3 | EUROPE | ly final courts cajole furiously final excuse 0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to 2 | ASIA | ges. thinly even pinto beans ca 4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl (5 rows) 6.模拟实验:Segment1实例异常 ## 查询dg1的后台进程号:3898 dgadmin@flash:~$ ps -ef | grep postgres postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process postgres 1100 1089 0 Jun27 ? 00:00:00 postgres: writer process postgres 1101 1089 0 Jun27 ? 00:00:00 postgres: wal writer process postgres 1102 1089 0 Jun27 ? 00:00:00 postgres: autovacuum launcher process postgres 1103 1089 0 Jun27 ? 00:00:02 postgres: stats collector process dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1 dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0 dgadmin 3915 3898 0 06:26 ? 00:00:00 postgres: port 25433, sweeper process dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E dgadmin 3928 3927 0 06:27 ? 00:00:00 postgres: port 15432, master logger process dgadmin 3931 3927 0 06:27 ? 00:00:00 postgres: port 15432, stats collector process dgadmin 3932 3927 0 06:27 ? 00:00:00 postgres: port 15432, writer process dgadmin 3933 3927 0 06:27 ? 00:00:00 postgres: port 15432, checkpoint process dgadmin 3934 3927 0 06:27 ? 00:00:00 postgres: port 15432, seqserver process dgadmin 3935 3927 0 06:27 ? 00:00:00 postgres: port 15432, ftsprobe process dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1 dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0 dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process dgadmin 5015 5012 0 06:42 ? 00:00:00 postgres: port 26433, logger process dgadmin 5022 5013 0 06:42 ? 00:00:00 postgres: port 26432, mirror process dgadmin 5023 5012 0 06:42 ? 00:00:00 postgres: port 26433, mirror process dgadmin 5106 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary consumer ack process dgadmin 5107 5103 0 06:42 ? 00:00:01 postgres: port 25433, primary recovery process dgadmin 5108 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary verification process dgadmin 5114 3899 0 06:42 ? 00:00:01 postgres: port 25432, primary process dgadmin 5115 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary receiver ack process dgadmin 5116 5114 1 06:42 ? 00:00:20 postgres: port 25432, primary sender process dgadmin 5117 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary consumer ack process dgadmin 5118 5114 0 06:42 ? 00:00:01 postgres: port 25432, primary recovery process dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres ## 杀死进程,模拟进程异常 dgadmin@flash:~$ ps -ef | grep postgres postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1 dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0 dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1 dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0 dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres ## 杀死进程,模拟异常 dgadmin@flash:~$ kill -9 3898 ## 重新查看,dg1的进程3989已不存在 dgadmin@flash:~$ ps -ef | grep postgres postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0 dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1 dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0 dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process dgadmin 5939 3733 0 07:09 pts/0 00:00:00 grep --color=auto postgres 7.访问数据库测试连通性:仍然可以正常查询 dgadmin@flash:~$ psql -d tpch psql (8.2.15) Type "help" for help. tpch=# select * from region; r_regionkey | r_name | r_comment -------------+---------------------------+--------------------------------------------------------------------------------------------------------------------- 1 | AMERICA | hs use ironic, even requests. s 3 | EUROPE | ly final courts cajole furiously final excuse 0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to 2 | ASIA | ges. thinly even pinto beans ca 4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl (5 rows) 8.查看集群状态:显示有1个primary节点异常 dgadmin@flash:~$ gpstate 20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args: 20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB' 20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19' 20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master... 20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments... . 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master instance = Active 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 1 20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total primary segment failures (at master) = 1 <<<<<<<< 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number postmaster processes missing = 1 <<<<<<<< 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 1 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2 20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<< 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1 20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- dgadmin@flash:~$ 至此本实验完成,可以看到,在Segment主实例异常情况下,Mirror会立即接管服务,不会对前台应用产生停机影响。 实验完成了,但是事情还没有结束哦,因为这是故障,所以故障产生后,需要进行修复,那么基于整个集群,我们都需要做什么呢? 要点一:及时恢复故障节点 要点二:虽然Mirror可以临时接管服务,保持服务的连续性,但是在实际生产过程中,由于节点及其Mirror的分散性,长期使用Mirror会导致数据分布不均匀,所以故障修复后,建议及时切换回原来的架构。 二、故障节点恢复 节点在故障以后,可以通过gprecoverseg命令恢复故障节点,如下: dgadmin@flash:~$ gprecoverseg 20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting gprecoverseg with args: 20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB' 20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19' 20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Checking if segments are ready 20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master... 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master... 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Greenplum instance recovery parameters 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery type = Standard 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery 1 of 1 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:---------------------------------------------------------- 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Synchronization mode = Incremental 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance host = flash 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance address = flash 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance directory = /dgdata/primary/dg1 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance port = 25433 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance replication port = 28433 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance host = flash 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance address = flash 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance directory = /dgdata/mirror/dg1 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance port = 26433 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance replication port = 27433 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Target = in-place 20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:---------------------------------------------------------- Continue with segment recovery procedure Yy|Nn (default=N): > y 20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-1 segment(s) to recover 20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped 20170628:22:01:39:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments 20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration with new mirrors 20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating mirrors ...... 20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting mirrors 20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ....... 20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results... 20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up 20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating primaries 20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 1 segments, please wait... ....... 20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results... 20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Done updating primaries 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-****************************************************************** 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating segments for resynchronization is completed. 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background. 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:- 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress. 20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-****************************************************************** 查看集群状态 dgadmin@flash:~$ gpstate 20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args: 20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB' 20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19' 20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master... 20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments... . 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master instance = Active 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2 20170628:22:19:46:003527 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<< 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1 20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:----------------------------------------------------- 三、切换回原集群状态 从第二部分的最后状态代码也可以看出,目前是一个Mirror节点接管了Primary节点的服务,我们本节要讲Mirror的服务交还给Primary。 下面执行:gprecoverseg -r 命令进行节点切换,切换完成后执行gpstate查看状态,代码略。 另外,我们也可以在数据库里面通过字典表查看切换信息: postgres=# select * from gp_configuration_history; time | dbid | desc -------------------------------+------+----------------------------------------------------------------------------------------------- 2017-06-28 06:42:21.789641+08 | 4 | gpaddmirrors: segment config for resync: inserted mirror segment configuration 2017-06-28 06:42:21.789641+08 | 5 | gpaddmirrors: segment config for resync: inserted mirror segment configuration 2017-06-28 06:42:21.789641+08 | 2 | gpaddmirrors: segment config for resync: segment mode, status, and replication port 2017-06-28 06:42:21.789641+08 | 3 | gpaddmirrors: segment config for resync: segment mode, status, and replication port 2017-06-28 06:42:35.327377+08 | 2 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 06:42:35.327377+08 | 3 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 06:42:35.327377+08 | 4 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 06:42:35.327377+08 | 5 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 06:56:03.98806+08 | 2 | FTS: changed segment to insync from resync. 2017-06-28 06:56:03.98806+08 | 4 | FTS: changed segment to insync from resync. 2017-06-28 06:56:04.045572+08 | 3 | FTS: changed segment to insync from resync. 2017-06-28 06:56:04.045572+08 | 5 | FTS: changed segment to insync from resync. 2017-06-28 07:10:07.745568+08 | 3 | FTS: content 1 fault marking status DOWN role m 2017-06-28 07:10:07.745849+08 | 5 | FTS: content 1 fault marking status UP mode: change-tracking role p 2017-06-28 22:01:45.341958+08 | 3 | gprecoverseg: segment config for resync: segment mode and status 2017-06-28 22:01:58.732622+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 22:01:58.732622+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 22:02:57.53962+08 | 5 | FTS: changed segment to insync from resync. 2017-06-28 22:02:57.53962+08 | 3 | FTS: changed segment to insync from resync. 2017-06-28 22:31:02.680013+08 | 5 | FTS: content 1 fault marking status DOWN role m 2017-06-28 22:31:02.680309+08 | 3 | FTS: content 1 fault marking status UP mode: change-tracking role p 2017-06-28 22:31:12.655695+08 | 5 | gprecoverseg: segment config for resync: segment mode and status 2017-06-28 22:31:25.848857+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 22:31:25.848857+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status 2017-06-28 22:31:58.752335+08 | 3 | FTS: changed segment to insync from resync. 2017-06-28 22:31:58.752335+08 | 5 | FTS: changed segment to insync from resync. (26 rows) 这样,整个今天的分享就结束了,最后再啰嗦一句。从本文例子看出,数据库主/备切换相当的简单,gprecoverseg命令相当的智能,在Primary的主机出现故障之后,Mirror会自动切换为Primary,不影响数据库的正常工作,但是对监控不是很到位的系统来说,不建议使用这个功能,首先这个功能存在一定的BUG,其次,监控不到位,一旦发现切换,并不能及时发现,如果再有节点出现故障,可能对数据恢复造成影响,而且如果单个节点的数据量非常大的时候,gprecoverseg同步数据的过程将会很漫长。 最后,祝大家再Deepgreen & Greenplum的路上一去不复返^_^ 同系列文章参考:

    Deepgreen & Greenplum 高可用(二) - Master故障转移

    相关资源:七夕情人节表白HTML源码(两款)
    最新回复(0)