Suppose you have a monolithic process which performs five million identical, calculation-intensive and time-consuming operations, each of which is totally encapsulated and independent from the others.
Each of the five million operations receives 10kB of input data, can take between 5microsecond and 50 millisecond to complete, and returns one floating point result.
How would you structure its implementation in order that the whole process completes in the minimum total elapsed time?
Discuss implementation and make recommendations for each of these target platforms:
- A simple multicored desktop PC
- An AWS instance with a large number of cores
- A PC equipped with a GPU
3.1. Please write a program in C++, which illustrates an example of parallelizing nested loops:
3.1.a. Initialise variables with a desired data of your choice.
3.1.b. Replace logic inside if conditions to perform any task you would like, e.g., evaluation of some analytical expression or specific algorithm.
3.1.c. Use any C++ libraries or frameworks to implement your solution, and comment on their benefits.
3.2. For comparison, evaluate execution times and results using a sequential implementation approach and the parallelized version.
3.3. Please comment on implementation details and why a selected parallel processing/execution model is the most suitable for your data types, hardware architecture or implemented algorithm?
Source code or any output files should be delivered by email preferably in a .zip file.