Multi Node Condor and Pegasus on Ubuntu 12.04 on AWS EC2

  • Post author:
  • Post category:IT
  • Post comments:0评论

Recently I need to do some experiments with the Montage astronomical image mosaic engine, using Pegasus as the workflow management system. This involves setting up a condor cluster and Pegasus on the submit host, and several other steps to run Montage in such an environment. After extensive search on the Internet, I find out that there exists no good documentation on how to accomplish this complicate task with my favorite Linux distribution – Ubuntu 12.04. I decide write a tutorial on this topic, in the hope that it might save someone else’s time in the future.

This tutorial includes the following three parts:

Single Node Condor and Pegasus on Ubuntu 12.04 on AWS EC2

Multi Node Condor and Pegasus on Ubuntu 12.04 on AWS EC2

Running Montage with Pegasus on AWS EC2

In the previous tutorial we have setup a single node Condor cluster with Pegasus. Now we will expand the Condor clutter to include multiple worker nodes. Keep the previous EC2 instance running, and we will call this instance theMaster Node
. The other Condor nodes will receive tasks from theMaster Node
, so we will call themWorker Nodes
.

In this tutorial, we will show how to add oneWorker Node
to the cluster.

[STEP 1: Updating Security Group Settings]

TheMaster Node
and theWorker Node
should be able to communicate with each other. The easiest way to achieve this is to run both theMaster Node
and theWorker Node
in the same VPC, and use the same security group for both theMaster Node
and theWorker Node
. Edit the inbound rules of the security group, add a rule to allow all traffic from within the security group.

[STEP 2: Install Condor]

Similar to the previous tutorial, download the latest version of HTCondor(native package)
for Ubuntu 12.04 from the following URL. What I have downloaded is condor-8.1.6-247684-ubuntu_12.04_amd64.deb. The actual filename might change over time.

http://research.cs.wisc.edu/htcondor/downloads/

Install Condor using the following commands:

$ sudo dpkg -i condor-8.1.6-247684-ubuntu_12.04_amd64.deb
$ sudo apt-get update
$ sudo apt-get install -f
$ sudo apt-get install chkconfig
$ sudo chkconfig condor on
$ sudo service condor start

Now we should have Condor up and running, and it should be automatically started when the system boots. Check into the status of Condor using the following commands:

$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@ip-10-0-5-11 LINUX      X86_64 Unclaimed Benchmar  0.060 1862  0+00:00:04
slot2@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:00:05
slot3@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:00:06
slot4@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:00:07
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     4     0       0         4       0          0        0

               Total     4     0       0         4       0          0        0

$ condor_q


-- Submitter: ip-10-0-5-114.ec2.internal :  : ip-10-0-5-114.ec2.internal
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

[STEP 3: Config Condor Master Node]

Use a text editor to open /etc/condor/condor_config, and add the following line to the end of the file:

ALLOW_WRITE = *

Then restart Condor with the following command:

$ sudo service condor restart

Also, find the IP address of theMaster Node
with the following command, you will need it to config theWorker Node
.

[STEP 4: Config Condor Worker Node]

Now we go ahead to config theWorker Node. Use a text editor to open /etc/condor/condor_config.local, find the following line

CONDOR_HOST = $(FULL_HOSTNAME)

and update it with the IP address of theMaster Node
. Assuming that the IP address of theMaster Node
is 192.168.1.1, then this line should look like the following

CONDOR_HOST = 192.168.1.1

Then restart Condor using the following command:

$ sudo service condor restart

Now on both theMaster Node
and theWorker Node, we will be able to see both nodes. In the following example, both theMaster Node
and theWorker Node are c3.xlarge instances. Each of the c3.xlarge instance have 4 vCPU’s, so we are seeing 8 slots in the cluster.

$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:04:36
slot2@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:05
slot3@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:06
slot4@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:07
slot1@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.040 1862  0+00:04:36
slot2@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:05
slot3@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:06
slot4@ip-10-0-5-11 LINUX      X86_64 Unclaimed Idle      0.000 1862  0+00:05:07
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     8     0       0         8       0          0        0

               Total     8     0       0         8       0          0        0

[STEP 5: Add More Worker Nodes]

To add moreWorker Nodes
, you can create an AMI out of the firstWorker Node
, than launch as manyWorker Nodes
as needed. Since the AMI has the above-mentioned configurations, they should be automatically added to the cluster when they are in running state.

发表回复