dsage update

sage No Comments

A pretty big rewrite of the worker part of dsage landed in trac now. It’s ticket #3600 on the Sage trac and is eagerly awaiting review. Here’s a quick rundown of the major changes:

1. Workers no longer poll the server for new jobs.

2. Workers no longer poll sage for when the job finishes.
3. Doctests run much more reliably now, and in much less time (no need for # long time now)
4. The worker, as well as the server use twistd now, this make things like running them under a profile
    trivial.

dsage news

sage 3 Comments

It’s been a while since I’ve blogged about dsage so here’s a big braindump on what’s been happening…

I was really happy by the amount of activity and interest in distributed computing at Sage Devel Days 1. I think the major participants were William Stein, Glenn Tarbox and Bill Furnish. 

As the week went on it became clear that dsage as it stands right now does not fulfill the needs of some (maybe even many, or most) users. As a result, right now I am aware of several “next gen” dsage proposals such as dsageNG and some stuff that Bill wrote (i think by the time dev1 was over, Bill was at dsage4 ;-) Bill and I chatted a bit about the architecture of dsage currently and the major problem he saw was that dsage workers used polling and there could be a significant (i.e., seconds) delay between when a new job arrived and when a job was processed by the workers. 

I thought about this for a while and I think that he’s absolutely right. Therefore, I’ve rewritten parts of the worker/server code so now workers will start on a job immediately. This should bring down the overhead of running dsage jobs considerably. 

The rewrite is also more Twisted by using twisted’s async process communication. It was actually surprisingly easy to write a worker pool using the existing tools in the framework.

I also added a convenience function called eval_function() that allows you to submit a live function as a job. This works for any function that can be pickled (its arguments as well of course). For example:

sage: def f(n):
...     return n*n
...
sage: j = d.eval_function(f, ((25,),{}), job_name='square')
sage: j
625
sage: print j.wall_time
0:00:00.144780

This is much, much faster than the performance before the rewriting of the workers.

Having eval_function als makes it really easy to the map part of map reduce (it’s also being referred to as scatter/gather). For example:

sage: jobs = d.map(f, [10..20])
sage: jobs
[100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]

Also, dsage now supports the new @parallel convenience decorator that William wrote:

sage: P = parallel(p_iter = d.parallel_iter)
sage: @P
....: def f(n,m):
....:     return n+m
....: 
sage: f([(1,2), (5, 10/3)])
[((1, 2), 3), ((5, 10/3), 25/3)]

William and I both agree that the strategy for Sage should be to have something that is very fast on local multicore machines (multiprocessing module comes to mind) while also having something that will work both on local clusters and WANs (dsage).

If anyone is interested in helping me to test out the new version and making it more robust, please let me know!

I’m hoping that this stuff will make it into the next major release of Sage since I will going on vacation for 6 weeks (hooray) on July 14th.

publishing your dotfiles

linux, sage 16 Comments

I discovered freehg.org today, which is a free mercurial repo hosting site. In spirit it’s very akin to github, but obviously lacks many of the features and much of the polish. However, this particular purpose, it’s more than sufficient.

If you access many *nix machines on a regular basis, you’ve probably have been annoyed that your custom configuration files are not immediately available. I used to scp config files around all the time, but that gold old quick. I’m going to show you my current setup for making sure that all the dotfiles (zshrc, vimrc, etc) that I use are version controlled and are easily accessible.

First, collect all your dotfiles in one directory and make it an hg repository. I use ~/.dotfiles, you can use whatever you like.  Here is what my .dotfiles looks like:

iapetus:~/.dotfiles> ls -l
total 54k
-rw-r--r-- 1 yqiang staff   85 2008-07-04 10:20 ackrc
-rw-r--r-- 1 yqiang staff   50 2008-05-29 18:24 bash_profile
-rw-r--r-- 1 yqiang staff 2.0k 2008-05-29 18:24 bashrc
-rw-r--r-- 1 yqiang staff  569 2008-07-04 14:09 create_symlinks.py
drwx------ 3 yqiang staff  102 2008-06-13 15:52 gtk-2.0
-rw-r--r-- 1 yqiang staff  624 2008-07-04 09:48 gvimrc
-rw-r--r-- 1 yqiang staff   48 2008-06-17 15:46 hgignore
-rw-r--r-- 1 yqiang staff  454 2008-07-04 10:30 hgrc
drwx------ 5 yqiang staff  170 2008-06-29 17:55 irssi
-rw-r--r-- 1 yqiang staff  403 2008-06-22 11:20 pdbrc
-rw-r--r-- 1 yqiang staff  642 2008-06-29 10:49 screenrc
-rw-r--r-- 1 yqiang staff 6.8k 2008-07-04 13:39 vimrc
-rw-r--r-- 1 yqiang staff 5.8k 2008-06-29 17:43 zshrc

Then create a freehg.org account and initialize a public repo there. You can find mine at:

http://freehg.org/u/yqiang/dotfiles/

Now, when you access a new machine, to get all your dotfiles in order, just do:

veritas:~ > hg clone http://freehg.org/u/yqiang/dotfiles/ .dotfiles
requesting all changes
adding changesets
adding manifests
adding file changes
added 22 changesets with 37 changes to 15 files
15 files updated, 0 files merged, 0 files removed, 0 files unresolved
veritas:~ > cd .dotfiles
veritas:~/.dotfiles > python create_symlinks.py
Symlinking /home/yqiang/.dotfiles/zshrc to /home/yqiang/.zshrc
Symlinking /home/yqiang/.dotfiles/gvimrc to /home/yqiang/.gvimrc
Symlinking /home/yqiang/.dotfiles/bash_profile to /home/yqiang/.bash_profile
Symlinking /home/yqiang/.dotfiles/hgignore to /home/yqiang/.hgignore
Symlinking /home/yqiang/.dotfiles/bashrc to /home/yqiang/.bashrc
Symlinking /home/yqiang/.dotfiles/hgrc to /home/yqiang/.hgrc
Symlinking /home/yqiang/.dotfiles/pdbrc to /home/yqiang/.pdbrc
Symlinking /home/yqiang/.dotfiles/ackrc to /home/yqiang/.ackrc
Symlinking /home/yqiang/.dotfiles/screenrc to /home/yqiang/.screenrc
Symlinking /home/yqiang/.dotfiles/vimrc to /home/yqiang/.vimrc

create_symlink.py is a simple python script that will create the symlinks for you. Here is the code for it:

#!/usr/bin/env python
import os
home = os.path.abspath(os.environ['HOME'])
path = os.path.join(home, '.dotfiles') 
excludes = ['gtk-2.0', 'create_symlinks.py']
for f in os.listdir(path):
    if f.startswith('.'):
        continue
    if f not in excludes:
        dst = os.path.join(home, '.' + f)
        src = os.path.abspath(f)
        try:
            print "Symlinking %s to %s" % (src, dst)
            os.symlink(src, dst)
        except Exception, msg:
            print "Failed to symlink %s to %s " % (src, dst)
            print msg

Tada. All your config files are in place now. If you’re really into it, you can run a cron script that automatically does an hg pull so you don’t even have to think about it.

Icons by N.Design Studio. Designed By Ben Swift. Powered by WordPress and Free WordPress Themes
Entries RSS Comments RSS Log in