Speaker
Abstract content <br> (Max 300 words)<br><a href="http://events.saip.org.za/getFile.py/access?resId=0&materialId=0&confId=34" target="_blank">Formatting &<br>Special chars</a>
The Finite Difference Time Domain (FDTD) method is widely used in modern computational electrodynamics. Its traditional implementation is a Yee scheme assuming second order explicit synchronous mesh update and its performance is memory-bound. To overcome this limit thre appear memory efficient approaches, for example, new classes of Locally Recursive Nonlocally Asyncronous (LRnLA) algorithms for explicit stencils of different orders are regularily proposed.
LRnLA is based on a representation of the problem by a space-time dependency graph and its recursive decomposition into subgraphs.
Depending on computer architecture, the decompositions are organized in space-time shapes maximally preserving data locality, i.e. being conformal to the memory hierarchy of the computer.
The LRnLA algorithms are suitable for processing large spatial meshes, since, unlike in the traditional synchronous mesh update, their efficiency (number of processed mesh cells per unit time) does not drastically drop with growing total mesh size.
The main difference between the algorithms of LRnLA family is the focus on various types of parallelism and memory subsystem hierarchy. This way, all current classes may be advantageous in different conditions.
We are testing various approaches to implement LRnLA algorithms for FDTD code on CPU architectures, including many-core clusters. The aim is to efficiently utilize CPU hierarchy and types of parallelism with the use of LRnLA algorithms of different classes. In this work the results of several current implementations are presented. We show tests of performance dependency on problem size on different computer archtectures. We demonstrate that it is possible to drastically increase operational intensity of the FDTD code implementation and to shift finite-difference simulation closer to compute-bound domain.