Václav Pech
Speculative calculations
Today a condensed post on a simple yet useful experimental concept from the next GPars version - speculative calculations.
A problem to solve
Imagine you need to perform a long-running task like calculating a time-expensive function or reading data from a file, database or internet. In adition to that, you know of several good ways (e.g. functions or urls) to achieve your goal. However, they are not all equal. Although they return back the same (as far as your needs are concerned) result, they may all take different amount of time to complete and some of them may even fail (due to e.g. network issues). And what's even worse, no-one is going to tell you up-front, which path (algorithm, url, file) will give you the solution first nor which paths lead to no solution at all. Shall I run quick sort or merge sort on my list? Which url will work best? Is this service available at its primary location or should I use the backup one?
Speculations to the rescue (for some)
GPars speculations give you the option to try all the available alternatives in parallel and so get the result from the fastest successful path, silently ignoring the slow or broken ones. This is what the speculate() methods on GParsPool and GParsExecutorsPool() can do.
def numbers = ...Here we're performing both quick sort and merge sort concurrently, while getting the result of the faster one. Given the parallel resources available these days on mainstream hardware, on many setups running the two functions in parallel will not have dramatic impact on the speed of calculation of either one, and so we get the result in about the same time as if we ran solely the faster of the two calculations. And we get the result sooner than when running the slower one. Yet we didn't have to know up-front, which of the two sorting algorithms would perform better on our data. Thus we speculated.
def quickSort = ...
def mergeSort = ...
def sortedNumbers = speculate(quickSort, mergeSort)
Similarly, downloading a document from multiple network sources of different speed and reliability would somewhat look like this:
import static groovyx.gpars.GParsPool.speculateTo become a concurrent speculator, checkout the recent GPars 0.11 snapshot and consider reading the snapshot user guide. We're looking forward to your feedback on the project mailinglist.
import static groovyx.gpars.GParsPool.withPool
def alternative1 = {
'http://www.dzone.com/links/index.html'.toURL().text
}
def alternative2 = {
'http://www.dzone.com/'.toURL().text
}
def alternative3 = {
'http://www.dzzzzzone.com/'.toURL().text //wrong url
}
def alternative4 = {
'http://dzone.com/'.toURL().text
}
withPool(4) {
println speculate([alternative1, alternative2, alternative3, alternative4]).contains('groovy')
}
Posted at 06:19AM Sep 03, 2010 by Vaclav Pech in Groovy | Comments[6]







Posted by mr-dustmite on September 04, 2010 at 06:46 PM CEST #
Normally in Java it is hard to abort a thread in progress since you either need to recheck the interrupted status (almost nobody is doing that) or add some other field that needs to be checked from time to time. But with an STM the transaction openForRead/openForWrite could be used for this functionality and by adding a 'abortOnly' field on the transaction, the basic infrastructure I there.
Peter Veentjer
Multiverse: Software Transactional Memory for Java
http://multiverse.codehaus.org
Posted by Peter Veentjer on September 09, 2010 at 02:45 PM CEST #
Posted by Vaclav Pech on September 09, 2010 at 09:56 PM CEST #
It doesn't rely in instrumentation at all (will be added in the future for easy Java integration, but the instrumentation only accesses the internal api's). The easiest way to make use of the api's is to make use of managed refs (just like Clojure, Akka and the forthcoming Scala integration).
So you can say stuff like:
Ref<String> name = ...
atomic{
name.set(name.get()+"foo");
}
Posted by Peter Veentjer on September 10, 2010 at 10:08 AM CEST #
As for the abortOnly feature, me being an STM beginner, I wonder whether using this flag only will prevent multiple transactions from completing and assigning a value to the result.
My guess is the transactions will still have to do something like:
if (result==null) result = myResult
else myTx.abort()
shortly before completion.
Posted by Vaclav Pech on September 10, 2010 at 01:04 PM CEST #
It won't help to prevent multiple commits, but we can write an additional synchronization structure for that as well. I already added the atomic commit of multiple transactions using the commitbarriers for Akka transactors, and adding a synchronization structure where only one transaction will complete would not be that hard I think. And the same synchronization structure could coordinate the 'abortOnly' field to make sure that transaction fail faster instead of waiting for them to fail when they want to commit (and figure out that another one already has done that).
Posted by Peter Veentjer on September 10, 2010 at 05:35 PM CEST #