A pretty big rewrite of the worker part of dsage landed in trac now. It’s ticket #3600 on the Sage trac and is eagerly awaiting review. Here’s a quick rundown of the major changes:
1. Workers no longer poll the server for new jobs.
code, math, life
A pretty big rewrite of the worker part of dsage landed in trac now. It’s ticket #3600 on the Sage trac and is eagerly awaiting review. Here’s a quick rundown of the major changes:
1. Workers no longer poll the server for new jobs.
It’s been a while since I’ve blogged about dsage so here’s a big braindump on what’s been happening…
I was really happy by the amount of activity and interest in distributed computing at Sage Devel Days 1. I think the major participants were William Stein, Glenn Tarbox and Bill Furnish.
As the week went on it became clear that dsage as it stands right now does not fulfill the needs of some (maybe even many, or most) users. As a result, right now I am aware of several “next gen” dsage proposals such as dsageNG and some stuff that Bill wrote (i think by the time dev1 was over, Bill was at dsage4
Bill and I chatted a bit about the architecture of dsage currently and the major problem he saw was that dsage workers used polling and there could be a significant (i.e., seconds) delay between when a new job arrived and when a job was processed by the workers.
I thought about this for a while and I think that he’s absolutely right. Therefore, I’ve rewritten parts of the worker/server code so now workers will start on a job immediately. This should bring down the overhead of running dsage jobs considerably.
The rewrite is also more Twisted by using twisted’s async process communication. It was actually surprisingly easy to write a worker pool using the existing tools in the framework.
I also added a convenience function called eval_function() that allows you to submit a live function as a job. This works for any function that can be pickled (its arguments as well of course). For example:
sage: def f(n): ... return n*n ... sage: j = d.eval_function(f, ((25,),{}), job_name='square') sage: j 625 sage: print j.wall_time 0:00:00.144780
This is much, much faster than the performance before the rewriting of the workers.
Having eval_function als makes it really easy to the map part of map reduce (it’s also being referred to as scatter/gather). For example:
sage: jobs = d.map(f, [10..20]) sage: jobs [100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]
Also, dsage now supports the new @parallel convenience decorator that William wrote:
sage: P = parallel(p_iter = d.parallel_iter) sage: @P ....: def f(n,m): ....: return n+m ....: sage: f([(1,2), (5, 10/3)]) [((1, 2), 3), ((5, 10/3), 25/3)]
William and I both agree that the strategy for Sage should be to have something that is very fast on local multicore machines (multiprocessing module comes to mind) while also having something that will work both on local clusters and WANs (dsage).
If anyone is interested in helping me to test out the new version and making it more robust, please let me know!
I’m hoping that this stuff will make it into the next major release of Sage since I will going on vacation for 6 weeks (hooray) on July 14th.
I discovered freehg.org today, which is a free mercurial repo hosting site. In spirit it’s very akin to github, but obviously lacks many of the features and much of the polish. However, this particular purpose, it’s more than sufficient.
If you access many *nix machines on a regular basis, you’ve probably have been annoyed that your custom configuration files are not immediately available. I used to scp config files around all the time, but that gold old quick. I’m going to show you my current setup for making sure that all the dotfiles (zshrc, vimrc, etc) that I use are version controlled and are easily accessible.
First, collect all your dotfiles in one directory and make it an hg repository. I use ~/.dotfiles, you can use whatever you like. Here is what my .dotfiles looks like:
iapetus:~/.dotfiles> ls -l total 54k -rw-r--r-- 1 yqiang staff 85 2008-07-04 10:20 ackrc -rw-r--r-- 1 yqiang staff 50 2008-05-29 18:24 bash_profile -rw-r--r-- 1 yqiang staff 2.0k 2008-05-29 18:24 bashrc -rw-r--r-- 1 yqiang staff 569 2008-07-04 14:09 create_symlinks.py drwx------ 3 yqiang staff 102 2008-06-13 15:52 gtk-2.0 -rw-r--r-- 1 yqiang staff 624 2008-07-04 09:48 gvimrc -rw-r--r-- 1 yqiang staff 48 2008-06-17 15:46 hgignore -rw-r--r-- 1 yqiang staff 454 2008-07-04 10:30 hgrc drwx------ 5 yqiang staff 170 2008-06-29 17:55 irssi -rw-r--r-- 1 yqiang staff 403 2008-06-22 11:20 pdbrc -rw-r--r-- 1 yqiang staff 642 2008-06-29 10:49 screenrc -rw-r--r-- 1 yqiang staff 6.8k 2008-07-04 13:39 vimrc -rw-r--r-- 1 yqiang staff 5.8k 2008-06-29 17:43 zshrc
Then create a freehg.org account and initialize a public repo there. You can find mine at:
http://freehg.org/u/yqiang/dotfiles/
Now, when you access a new machine, to get all your dotfiles in order, just do:
veritas:~ > hg clone http://freehg.org/u/yqiang/dotfiles/ .dotfiles requesting all changes adding changesets adding manifests adding file changes added 22 changesets with 37 changes to 15 files 15 files updated, 0 files merged, 0 files removed, 0 files unresolved veritas:~ > cd .dotfiles veritas:~/.dotfiles > python create_symlinks.py Symlinking /home/yqiang/.dotfiles/zshrc to /home/yqiang/.zshrc Symlinking /home/yqiang/.dotfiles/gvimrc to /home/yqiang/.gvimrc Symlinking /home/yqiang/.dotfiles/bash_profile to /home/yqiang/.bash_profile Symlinking /home/yqiang/.dotfiles/hgignore to /home/yqiang/.hgignore Symlinking /home/yqiang/.dotfiles/bashrc to /home/yqiang/.bashrc Symlinking /home/yqiang/.dotfiles/hgrc to /home/yqiang/.hgrc Symlinking /home/yqiang/.dotfiles/pdbrc to /home/yqiang/.pdbrc Symlinking /home/yqiang/.dotfiles/ackrc to /home/yqiang/.ackrc Symlinking /home/yqiang/.dotfiles/screenrc to /home/yqiang/.screenrc Symlinking /home/yqiang/.dotfiles/vimrc to /home/yqiang/.vimrc
create_symlink.py is a simple python script that will create the symlinks for you. Here is the code for it:
#!/usr/bin/env python import os home = os.path.abspath(os.environ['HOME']) path = os.path.join(home, '.dotfiles') excludes = ['gtk-2.0', 'create_symlinks.py'] for f in os.listdir(path): if f.startswith('.'): continue if f not in excludes: dst = os.path.join(home, '.' + f) src = os.path.abspath(f) try: print "Symlinking %s to %s" % (src, dst) os.symlink(src, dst) except Exception, msg: print "Failed to symlink %s to %s " % (src, dst) print msg
Tada. All your config files are in place now. If you’re really into it, you can run a cron script that automatically does an hg pull so you don’t even have to think about it.
Recent Comments