Pegasus / Montage workflow on Amazon Web Services

  • Post author:
  • Post category:IT
  • Post comments:0评论

I took some notes while going through Mats Rynge’s tutorial on “Pegasus / Montage workflow on Amazon Web Services“. The tutorial if officially available at the following URL, but the content is far from complete. I managed to finished this tutorial, and thought that my experience might be valuable for someone else out there in the dark.

https://confluence.pegasus.isi.edu/display/pegasus/2013+-+Montage+workflow+using+Amazon+Web+Services

Step 1. Launch an EC2 instance

In the Oregon region, launch an EC2 instance with AMIami-f4e47cc4. It is recommend that you use the same security group for all EC2 instance to be launched in the Condor cluster. Also, in the security group, all communication between all EC2 instances using the same security group.

SSH into the instance using the following command:

ssh -i keypair.pem montage@IP
ssh-keygen
cd .ssh
cat id_rsa.pub authorized_keys
cd ~

Step 2. Configure Condor

Edit /etc/condor/config.d/20_security.conf,update ALLOW_WRITE and ALLOW_READ to the IP range of your VPC. For example, if the IP range of your VPC is 172.31.0.0/16, then you can set “ALLOW_WRITE = 172.31.*” and “ALLOW_READ = 172.31.*”. Then you need to restart Condor with the following command:

sudo service condor restart

Step 3. Create a Montage workflow

mkdir workfow
cd workflow
mDAG 2mass j M17 0.5 0.5 0.0002777778 . file://$PWD file://$PWD/inputs
generate-montage-replica-catalog 

Step 4. Pegasus Related

cd ~/etc
cp ../workflow/replica.catalog .
cp ../workflow/dag.xml .

Step 5. Update site.xml with the following content

<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus
.isi.edu/schema/sc-4.0.xsd" version="4.0">
    <site  handle="local" arch="x86" os="LINUX">
        <directory type="shared-scratch" path="/home/montage/scratch">
            <file-server operation="all" url="file:///home/montage/scratch"/>
        </directory>
        <directory type="local-storage" path="/var/www/html/outputs">
            <file-server operation="all" url="file:///var/www/html/outputs"/>
        </directory>
        <profile namespace="env" key="SSH_PRIVATE_KEY">/home/montage/.ssh/id_rsa</profile>
    </site>
    <site  handle="condor_pool" arch="x86_64" os="LINUX">
        <directory type="shared-scratch" path="/home/montage/scratch">
            <file-server operation="all" url="scp://127.0.0.1/home/montage/scratch"/>
        </directory>
        <profile namespace="pegasus" key="style" >condor</profile>
        <profile namespace="condor" key="universe" >vanilla</profile>
        <profile namespace="env" key="MONTAGE_HOME" >/opt/montage/v3.3</profile>
    </site>
</sitecatalog>

Step 6. Plan the workflow

pegasus-plan --conf pegasus.conf --dax dag.xml

Step 7. Run the workflow

pegasus-run  /home/montage/etc/montage/pegasus/montage/run0001

Step 8. Monitor the workflow

pegasus-status -l /home/montage/run-2.0/montage/pegasus/montage/run0001
condor_status
condor_q
tail montage/pegasus/montage/run0001/jobstate.log

The last command is very nice in that you can see what is currently being run. Please note that you need to replace the path with the actual path given to you by Pegasus.

That’s all. After spending weeks searching over the Internet, everything now seems to be simple.

发表回复