Performance models of high performance computing (HPC) applications are important for several reasons. First, they provide insight to designers of HPC systems on the role of subsystems such as the processor or the network in determining application performance. Second, they allow HPC centers more accurately to target procurements to resource requirements. Third, they can be used to identify application performance bottlenecks and to provide insights about scalability issues. The suitability of a performance model, however, for a particular performance investigation is a function of both the accuracy and the cost of the model.
A semi-empirical model previously published by the authors for an astrophysics application was shown to be inaccurate when predicting communication cost for large numbers of processors. It is hypothesized that this deficiency is due to the inability of the model adequately to capture communication contention (threshold effects) as well as other un-modeled components such as noise and I/O contention. In this paper we present a new approach to capture these unknown features to improve the predictive capabilities of the model. This approach uses a systematic model error-correction procedure that uses evolutionary algorithms to find an error correction term to augment the eXisting model. Four variations of this procedure were investigated and all were shown to produce better results than the original model.
[...] GA/GP parameter settings for MECP Parameters Settings The error correction term is a function of input parameters nx, np, and T. Since the parameters are not fixed in equations and the unknown machine parameters are found by GA for these cases. The ect is not shown here, but includes operators and and input parameters (application and machine parameters). The objective function to be minimized in the GA/GP procedure is a measure of the error between the calculated and measured communication times MECP APPLIED TO EVH1 COMMUNICATION MODEL EVH1 (Enhanced version of Virginia Hydrodynamics) is a finite-difference astrophysics code containing over 6500 lines of Fortran and MPI (Message Passing Interface) In earlier work EVH1 runtime was modeled for 2Dimensional and 3-Dimensional problems on two different architectures. [...]
[...] This approach starts by developing a semi-empirical model and then improving this further by using a genetic programming-based model error correction procedure (MECP). A semi-empirical parallel performance model typically consists of two components: computation time model and communication time model. Earlier work involving standard semi-empirical modeling for an astrophysics code EVH1 (Enhanced Virginia Hydrodynamics code) showed that while the computation model component prediction was satisfactory, the communication model prediction, particularly for large processor counts, was poor. This deficiency is probably due to the inability of a typical semi-empirical model to capture difficult to model components such as message contention and noise. [...]
[...] Again, the model parameter values were obtained by using Levenberg-Marquadt procedure using the lsqcurvefit function in MATLAB GYRO MODEL RESULTS We developed a semi empirical model for computation time for GYRO application by considering the input parameters that contribute significantly to the computation time (grid dimensions, time steps etc.). The model for computation time can be expressed as: tcomp=1/np.ts.{Tg^a.rg^b.tc}.k In the above equation, b and tc are constants to be fitted and k is a known constant, np = number of processors, ts = time step, Tg = toroidal grid, rg = radial grid, k = 1 for the B1-std and B2-cy problems, and k = 2 for the B3-gtc problem. [...]
[...] Since we were interested in a single model that predicts well for all three problems of GYRO, we applied MECP to improve the prediction performance of this model. As in EVH1, we tested all 4 cases of MECP for the GYRO communication model. All four cases performed nearly equally well. Totally 15 random trials were run for each case. Out of these 15 trails, in terms of frequency, case-1 showed improved results than the original model 100% of the time, case- case- and case- of the time. [...]
[...] The question remains whether these same models could be applied to a different architecture simply by fitting or adjusting only the machine parameters (i.e., latency, bandwidth, etc.) A test of this hypothesis would indicate how well our previously developed models delineated the application specific CONCLUSIONS The following points summarize the results and conclusions from this paper: We have developed and demonstrated a new Model Error Correction Procedure' approach for modeling performance of parallel applications through two HPC applications across two platforms. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee