MatrixDB 4 can be installed on Redhat 7, CentOS 7, and CentOS-compatible operating systems. This document describes the steps to quickly deploy MatrixDB 4 clusters on multiple CentOS 7 servers or virtual machines. Taking three nodes as an example, the main node is mdw and the two data nodes are sdw1 and sdw2 respectively.
Please refer to [MatrixDB Installation and Deployment] for the teaching video of this course (https://www.bilibili.com/video/BV1Jf4y1b7Ef/)
Notes!
The MatrixDB installation environment must support at least Haswell and above Intel processor architecture, or Excavator and above AMD processor architecture.
The server installation process includes five steps: installation preparation, database RPM installation, Python dependency package installation, database initialization and post-installation settings.
On all nodes, perform the following operations through the root user.
MatrixDB 4 requires Python 3.6. Please use the following command to install and set Python 3.6 as the default version:
yum install centos-release-scl
yum install rh-python36
scl enable rh-python36 bash
Install parquet dependencies
yum install -y epel-release || yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(cut -d: -f5 /etc/system-release-cpe | cut -d. -f1).noarch.rpm
yum install -y https://apache.jfrog.io/artifactory/arrow/centos/$(cut -d: -f5 /etc/system-release-cpe | cut -d. -f1)/apache-arrow-release-latest.rpm
yum install -y arrow-libs-3.0.0 parquet-libs-3.0.0
Turn off the firewall:
systemctl stop firewalld.service
systemctl disable firewalld.service
Turn off SELinux, edit /etc/selinux/config, and set the value of SELINUX to disabled:
sed s/^SELINUX=.*$/SELINUX=disabled/ -i /etc/selinux/config
setenforce 0
Make sure that there are persistent host names on all nodes. If they do not exist, please use the following command to set the host name. For example, you can set it in the master node like this:
hostnamectl set-hostname mdw
The two child nodes also set corresponding host names:
hostnamectl set-hostname sdw1
hostnamectl set-hostname sdw2
Ensure that all nodes in the cluster can access each other through hostname and IP. Add a record in /etc/hosts and map the host name to a local network card address. For example, the /etc/hosts of the three nodes contain something like this:
192.168.100.10 mdw
192.168.100.11 sdw1
192.168.100.12 sdw2
On all nodes, use the root user to execute the following yum command to install the database RPM package, and the system dependency library will be automatically installed. By default, it will be installed in the /usr/local/matrixdb directory:
yum install matrixdb-4.0.0-1.el7.x86_64.rpm
Note: During the actual installation process, please replace the file name with the latest downloaded rpm package name
On all nodes, use the root user to execute the following command to install the python package that MatrixDB depends on. Note that source greenplum_path.sh must be executed so that the correct version of the dependency package can be installed:
source /usr/local/matrixdb/greenplum_path.sh
yum install gcc python3-devel
pip3 install --upgrade setuptools
pip3 install argparse psutil pygresql pyyaml
The graphical deployment provided by MatrixDB is still used here. Remote graphical deployment requires server ports 8240 and 4617 to access. After the installation is complete, these ports of all nodes will be opened by default.
Use your browser to access the following graphical installation wizard URL, which is the IP of the mdw server:
http://<IP>:8240/
On the first page of the installation wizard, select "Add multiple nodes and initialize the database cluster" and click Next:
Next, start the five-step operation of multi-machine deployment.
The first step is to add a node, enter the IP address, host name or FQDN of the node in the text box and click "Add Node":
After adding sdw1 and sdw2, click "Next"
At this time, the interconnection test between the hosts will be carried out to ensure that the network between the hosts is connected.
The second step is to configure the database and select the database directory storage path and number of segments. The system automatically recommends the largest space of disks and the number of segments matching the system resources, which can be adjusted according to the specific usage scenario. "Enable automatic data mirroring" determines whether the cluster data node contains backup images. It is recommended to check it in the production environment so that the cluster is highly available. After confirming, click "Next":
Step 3: Set the password. MatrixDB will create an mxadmin database administrator account and serve as a super account. In this link, set the password of the mxadmin account, and then click "Next" (the password of the database account is set here, not the password of the operating system account):
Step 4: Confirm the deployment. This step will list the configuration parameters for the previous operation. After confirming that it is correct, click "Execute deployment":
Then, the system will automatically deploy the cluster and list the detailed steps and execution progress.
After all the steps are successfully executed, the deployment is completed. Click "Done":
At this time, you can see the basic methods of managing the database and how to set up allowing remote connections. To confirm that the database cluster is successfully deployed and accessible, you can click "Test Connection":
When you see a prompt for a successful connection, it means that the cluster can receive user requests normally:
MatrixDB is installed by default for security reasons. Remote connections are not allowed. If you need to connect from a personal computer or other remote connection, please manually modify the $MASTER_DATA_DIRECTORY/pg_hba.conf file on the master node mdw to add a line like this, indicating that users from any IP who access all databases are allowed to connect through password authentication. The IP range or database name can be limited according to actual needs to be used to reduce security risks:
host all all 0.0.0.0/0 md5
After completing these modifications, you need to execute the following command to reload the new settings of pg_hba.conf in the database:
gpstop -u
MatrixDB start, stop, restart and status viewing can be completed separately through the following commands. More command parameters can be viewed through --help:
gpstart -a
gpstop -af
gpstop -arf
gpstate