dsage news

sage

It’s been a while since I’ve blogged about dsage so here’s a big braindump on what’s been happening…

I was really happy by the amount of activity and interest in distributed computing at Sage Devel Days 1. I think the major participants were William Stein, Glenn Tarbox and Bill Furnish. 

As the week went on it became clear that dsage as it stands right now does not fulfill the needs of some (maybe even many, or most) users. As a result, right now I am aware of several “next gen” dsage proposals such as dsageNG and some stuff that Bill wrote (i think by the time dev1 was over, Bill was at dsage4 ;-) Bill and I chatted a bit about the architecture of dsage currently and the major problem he saw was that dsage workers used polling and there could be a significant (i.e., seconds) delay between when a new job arrived and when a job was processed by the workers. 

I thought about this for a while and I think that he’s absolutely right. Therefore, I’ve rewritten parts of the worker/server code so now workers will start on a job immediately. This should bring down the overhead of running dsage jobs considerably. 

The rewrite is also more Twisted by using twisted’s async process communication. It was actually surprisingly easy to write a worker pool using the existing tools in the framework.

I also added a convenience function called eval_function() that allows you to submit a live function as a job. This works for any function that can be pickled (its arguments as well of course). For example:

sage: def f(n):
...     return n*n
...
sage: j = d.eval_function(f, ((25,),{}), job_name='square')
sage: j
625
sage: print j.wall_time
0:00:00.144780

This is much, much faster than the performance before the rewriting of the workers.

Having eval_function als makes it really easy to the map part of map reduce (it’s also being referred to as scatter/gather). For example:

sage: jobs = d.map(f, [10..20])
sage: jobs
[100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]

Also, dsage now supports the new @parallel convenience decorator that William wrote:

sage: P = parallel(p_iter = d.parallel_iter)
sage: @P
....: def f(n,m):
....:     return n+m
....: 
sage: f([(1,2), (5, 10/3)])
[((1, 2), 3), ((5, 10/3), 25/3)]

William and I both agree that the strategy for Sage should be to have something that is very fast on local multicore machines (multiprocessing module comes to mind) while also having something that will work both on local clusters and WANs (dsage).

If anyone is interested in helping me to test out the new version and making it more robust, please let me know!

I’m hoping that this stuff will make it into the next major release of Sage since I will going on vacation for 6 weeks (hooray) on July 14th.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google

3 Responses to “dsage news”

  1. Pages tagged "architecture" Says:
    July 4th, 2008 at 5:21 pm

    [...] bookmarks tagged architecture dsage news saved by 4 others     jokester1111 bookmarked on 07/04/08 | [...]

  2. Mike Hansen Says:
    July 5th, 2008 at 9:52 pm

    Excellent work Yi! Are the patches up on http://trac.sagemath.org ?

  3. Yi Qiang Says:
    July 6th, 2008 at 12:49 pm

    @Mike Hansen
    Not yet, still have to write doc/unittests and am waiting for the next alpha release from mabshoff so i have something to base the patches against.

Leave a Reply

Icons by N.Design Studio. Designed By Ben Swift. Powered by WordPress and Free WordPress Themes
Entries RSS Comments RSS Log in