Václav Pech : Weblog

Václav Pech

Friday Sep 03, 2010

Speculative calculations

Today a condensed post on a simple yet useful experimental concept from the next GPars version - speculative calculations.

A problem to solve

Imagine you need to perform a long-running task like calculating a time-expensive function or reading data from a file, database or internet. In adition to that, you know of several good ways (e.g. functions or urls) to achieve your goal. However, they are not all equal. Although they return back the same (as far as your needs are concerned) result, they may all take different amount of time to complete and some of them may even fail (due to e.g. network issues). And what's even worse, no-one is going to tell you up-front, which path (algorithm, url, file) will give you the solution first nor which paths lead to no solution at all. Shall I run quick sort or merge sort on my list? Which url will work best? Is this service available at its primary location or should I use the backup one?

Speculations to the rescue (for some)

GPars speculations give you the option to try all the available alternatives in parallel and so get the result from the fastest successful path, silently ignoring the slow or broken ones. This is what the speculate() methods on GParsPool and GParsExecutorsPool() can do.

def numbers = ...
def quickSort = ...
def mergeSort = ...
def sortedNumbers = speculate(quickSort, mergeSort)
Here we're performing both quick sort and merge sort concurrently, while getting the result of the faster one. Given the parallel resources available these days on mainstream hardware, on many setups running the two functions in parallel will not have dramatic impact on the speed of calculation of either one, and so we get the result in about the same time as if we ran solely the faster of the two calculations. And we get the result sooner than when running the slower one. Yet we didn't have to know up-front, which of the two sorting algorithms would perform better on our data. Thus we speculated.

Similarly, downloading a document from multiple network sources of different speed and reliability would somewhat look like this:

import static groovyx.gpars.GParsPool.speculate
import static groovyx.gpars.GParsPool.withPool

def alternative1 = {
'http://www.dzone.com/links/index.html'.toURL().text
}

def alternative2 = {
'http://www.dzone.com/'.toURL().text
}

def alternative3 = {
'http://www.dzzzzzone.com/'.toURL().text //wrong url
}

def alternative4 = {
'http://dzone.com/'.toURL().text
}

withPool(4) {
println speculate([alternative1, alternative2, alternative3, alternative4]).contains('groovy')
}
To become a concurrent speculator, checkout the recent GPars 0.11 snapshot and consider reading the snapshot user guide. We're looking forward to your feedback on the project mailinglist.

Comments:

That's wonderful to see that GPars has expanded to include this. Thank you for the quick and concise description and examples!

Posted by mr-dustmite on September 04, 2010 at 06:46 PM CEST #

I think it could be combined with an stm as well. Each section is executed under its own transactions and once one transaction has found a successful solution, the others can be aborted and the changes they made will never be committed.

Normally in Java it is hard to abort a thread in progress since you either need to recheck the interrupted status (almost nobody is doing that) or add some other field that needs to be checked from time to time. But with an STM the transaction openForRead/openForWrite could be used for this functionality and by adding a 'abortOnly' field on the transaction, the basic infrastructure I there.

Peter Veentjer
Multiverse: Software Transactional Memory for Java
http://multiverse.codehaus.org

Posted by Peter Veentjer on September 09, 2010 at 02:45 PM CEST #

Thank you, Peter, for the tip. I've never thought about STM this way. Employing STM as a vehicle to control parallel speculations. In fact, unlike with the current model, we could let the parallel speculations share the same (transactional) data.

Posted by Vaclav Pech on September 09, 2010 at 09:56 PM CEST #

If you want we can set something up. I'm currently working on a completely new STM engine that is better scalable and better performing (will be part of Multiverse 0.7) and it already contains the 'abortOnly' for just this purpose (although the field isn't volatile.. but I can change that).

It doesn't rely in instrumentation at all (will be added in the future for easy Java integration, but the instrumentation only accesses the internal api's). The easiest way to make use of the api's is to make use of managed refs (just like Clojure, Akka and the forthcoming Scala integration).

So you can say stuff like:

Ref<String> name = ...

atomic{
name.set(name.get()+"foo");
}

Posted by Peter Veentjer on September 10, 2010 at 10:08 AM CEST #

It's awesome to hear about Multiverse progressing. It's been on the GPars radar for quite a while and I think the time for GPars to adopt an STM implementation may not be far away. Let's move this discussion to email.

As for the abortOnly feature, me being an STM beginner, I wonder whether using this flag only will prevent multiple transactions from completing and assigning a value to the result.
My guess is the transactions will still have to do something like:
if (result==null) result = myResult
else myTx.abort()
shortly before completion.

Posted by Vaclav Pech on September 10, 2010 at 01:04 PM CEST #

I've added you to my gmail contacts lists (I'm the hairy monkey: alarmnummer).


It won't help to prevent multiple commits, but we can write an additional synchronization structure for that as well. I already added the atomic commit of multiple transactions using the commitbarriers for Akka transactors, and adding a synchronization structure where only one transaction will complete would not be that hard I think. And the same synchronization structure could coordinate the 'abortOnly' field to make sure that transaction fail faster instead of waiting for them to fail when they want to commit (and figure out that another one already has done that).

Posted by Peter Veentjer on September 10, 2010 at 05:35 PM CEST #

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation

Speaking




Creative Commons