next up previous
Next: Conclusion Up: Parallel programming and scalability Previous: Cost modelling and performance

Scalability

Programmers take the burden of writing parallel programs to increase speed and memory. The aim of every parallel algorithm designer is to write code that scales linearly, i.e. runs $ p$ times as fast on a $ p$-processor machine. This clearly constitutes an upper bound, if the sequential algorithm is already optimal. Linear scalability is achieved by using good load-balancing, keeping all processors busy all the time and communication costs are minimized.

Data dependency can make optimal speed-up impossible. It determines parallel complexity, the minimum number of steps an algorithm would need to run on a PRAM-computer. This constitutes an upper bound on the maximal speed-up that can be achieved.

There are many different and more sophisticated layouts of parallel implementations possible. The right choice depends on the size of the portfolio and available hardware. For the sake of simplicity in this article we have chosen the brute force approach.

Sketch of the scalability for a parallel brute-force GS: To avoid local extrema, one has to start many times from different grid-points:

  1. Superstep: Broadcast the initial portfolio structure and search-areas, or only the portfolio structure and use random startpoints. Depending on the network architecture (reflected in the value of $ g$), one might use several supersteps and use e.g. a tree-shaped communication form. Asymptotic cost for a 1 phase broadcast: $ l+npg$, where $ n$ is the size of the initial portfolio structure.
  2. Superstep: Now work out gradient searches on all processors for a given time. E.g. every processors performs a set-number of searches. On average this will balance out. Asymptotic cost: $ 1/p \times$ sequential time, as if $ p$ processors work out $ k/p$ searches, $ k$ searches are performed in total. The sequential time is the all dominating factor.

  3. Superstep: Each processor sends its best grid point back to processor 1, which sorts them and gives the final result. Asymptotic cost: $ l+ng+p$. The extra $ p$ arise from chosing the point with best fitness.
As the communication cost, sorting, etc. is negligible for any reasonable number of searches, this algorithm clearly scales linearly with the number of processors used.

Sketch of the scalability for SI:

Superstep: As in the 1-processor mode (see Figure 2), but now performed on all $ p$ processors. Every $ 1000$ or $ 10000$ iterations fit values are exchanged, then the next superstep starts.
Since the cost of data interchange is negligible compared to the cost of the iterations in each superstep, we have scalability as in the GS case.

One typical schoolbook error in this context is not to use a high quality random number generator, assuring independent random number streams on all processors (cf. Mascagni, Ceperley and Srinivasan (1998, 1999)).

Large clusters as well as the rise of grid-computing requires analytic forecastig of run-times to chose the appropriate hardware for the task. There are many potential trade-offs (cf. Jarvis et al. (2002, 2003) and Roehrl (1998)): time versus money, etc. Our paper has shown that a pragmatic approach can take advantage of developments in computerscience to enable the exploration of new portfolio optimization techniques using parallel computing techniques.


next up previous
Next: Conclusion Up: Parallel programming and scalability Previous: Cost modelling and performance
2003-10-24 Approximity