Deploying a Pulsar cluster on AWS using Terraform and Ansible
One of the easiest ways to get a Pulsar cluster running on (AWS) is to use the Terraform infrastructure provisioning tool and the server automation tool. Terraform can create the resources necessary for running the Pulsar cluster—-EC2 instances, networking and security infrastructure, etc.—-While Ansible can install and run Pulsar on the provisioned resources.
In order to install a Pulsar cluster on AWS using Terraform and Ansible, you need to prepare the following things:
- An and the command-line tool
- Python and
- The
terraform-inventory
tool, which enables Ansible to use Terraform artifacts
You also need to make sure that you are currently logged into your AWS account via the aws
tool:
安装
You can install Ansible on Linux or macOS using pip.
$ pip install ansible
You can install Terraform using the instructions here.
You also need to have the Terraform and Ansible configuration for Pulsar locally on your machine. You can find them in the of Pulsar, which you can fetch using Git commands:
$ git clone https://github.com/apache/pulsar
$ cd pulsar/deployment/terraform-ansible/aws
If you already have an SSH key and want to use it, you can skip the step of generating an SSH key and update
private_key_file
setting inansible.cfg
file andpublic_key_path
setting interraform.tfvars
file.For example, if you already have a private SSH key in
~/.ssh/pulsar_aws
and a public key in~/.ssh/pulsar_aws.pub
, follow the steps below:
- update
ansible.cfg
with following values:
- update
terraform.tfvars
with following values:
public_key_path=~/.ssh/pulsar_aws.pub
In order to create the necessary AWS resources using Terraform, you need to create an SSH key. Enter the following commands to create a private SSH key in ~/.ssh/id_rsa
and a public key in ~/.ssh/id_rsa.pub
:
$ ssh-keygen -t rsa
Do not enter a passphrase (hit Enter instead when the prompt comes out). Enter the following command to verify that a key has been created:
Create AWS resources using Terraform
$ terraform init
# This will create a .terraform folder
After that, you can apply the default Terraform configuration by entering this command:
$ terraform apply
Then you see this prompt below:
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value:
Type and hit Enter. Applying the configuration could take several minutes. When the configuration applying finishes, you can see Apply complete!
along with some other information, including the number of resources created.
You can apply a non-default Terraform configuration by changing the values in the terraform.tfvars
file. The following variables are available:
When you run the Ansible playbook, the following AWS resources are used:
- 9 total (EC2) instances running the ami-9fa343e7 Amazon Machine Image (AMI), which runs . By default, that includes:
- 3 small VMs for ZooKeeper (t2.small instances)
- 3 larger VMs for BookKeeper (i3.xlarge instances)
- 2 larger VMs for Pulsar (c5.2xlarge instances)
- 1 larger VMs for Pulsar (c5.2xlarge instances)
- An EC2
- A virtual private cloud (VPC) for security
- A for the Pulsar cluster’s VPC
- A subnet for the VPC
All EC2 instances for the cluster run in the region.
When you apply the Terraform configuration by entering the command terraform apply
, Terraform outputs a value for the pulsar_service_url
. The value should look something like this:
You can fetch that value at any time by entering the command terraform output pulsar_service_url
or parsing the terraform.tstate
file (which is JSON, even though the filename does not reflect that):
$ cat terraform.tfstate | jq .modules[0].outputs.pulsar_service_url.value
At any point, you can destroy all AWS resources associated with your cluster using Terraform’s destroy
command:
Before you run the Pulsar playbook, you need to mount the disks to the correct directories on those bookie nodes. Since different type of machines have different disk layout, you need to update the task defined in setup-disk.yaml
file after changing the instance_types
in your terraform config,
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
setup-disk.yaml
After that, the disks is mounted under /mnt/journal
as journal disk, and /mnt/storage
as ledger disk. Remember to enter this command just only once. If you attempt to enter this command again after you have run Pulsar playbook, your disks might potentially be erased again, causing the bookies to fail to start up.
Run the Pulsar playbook
Once you have created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible.
(Optional) If you want to use any , edit the Download Pulsar IO packages
task in the deploy-pulsar.yaml
file and uncomment the connectors you want to use.
To run the playbook, enter this command:
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
../deploy-pulsar.yaml
If you have created a private SSH key at a location different from ~/.ssh/id_rsa
, you can specify the different location using the --private-key
flag in the following command:
$ ansible-playbook \
--user='ec2-user' \
--inventory=`which terraform-inventory` \
--private-key="~/.ssh/some-non-default-key" \
../deploy-pulsar.yaml
You can now access your running Pulsar using the unique Pulsar connection URL for your cluster, which you can obtain following the instructions above.
For a quick demonstration of accessing the cluster, we can use the Python client for Pulsar and the Python shell. First, install the Pulsar Python module using pip:
$ pip install pulsar-client
Now, open up the Python shell using the command:
Once you are in the shell, enter the following command:
If all of these commands are successful, Pulsar clients can now use your cluster!