- Multiply 2 matrices using thread-pool, the program will receive 2 csv files as input that must be read and loaded into memory. The matrix product must be burned to another file
- The size of the thread pool to be used will be given
- Each matrix is composed of integers, the files do not have headers, the column separator is the character "," and each line indicates a new row
A thread pool is a group of pre-instantiated, idle threads which stand ready to be given work. These are preferred over instantiating new threads for each task when there is a large number of short tasks to be done rather than a small number of long ones. This prevents having to incur the overhead of creating a thread a large number of times.
When the thread pool is created, it will either instantiate a certain number of threads to make available or create new ones as needed depending on the needs of the implementation.
When the pool is handed a Task, it takes a thread from the container (or waits for one to become available if the container is empty), hands it a Task, and meets the barrier. This causes the idle thread to resume execution, invoking the execute() method of the Task it was given. Once execution is complete, the thread hands itself back to the pool to be put into the container for re-use and then meets its barrier, putting itself to sleep until the cycle repeats.
It will vary by environment, but in simplified terms, you need the following:
- A way to create threads and hold them in an idle state. This can be accomplished by having each thread wait at a barrier until the pool hands it work.
- A container to store the created threads, such as a queue or any other structure that has a way to add a thread to the pool and pull one out.
- A standard interface or abstract class for the threads to use in doing work. This might be an abstract class called Task with an execute() method that does the work and then returns.
Before diving into our solution, we need to understand a basic concept when multipliying matrixes:
If is an matrix and is an matrix,
the matrix product is defined to be the matrix
such that
with ThreadPoolExecutor(max_workers=pSize) as ex:
ex.map(mult, a)
Basically, ThreadPoolExecutor was introduced in Python in concurrent.futures module to efficiently manage and create threads. The pool keeps track and manages the threads lifecycle and schedules them on the programmer's behalf thus making the code much simpler and less buggy.
The argument we are using (max_workers) it's the number of threads aka size of pool. The method we are using (map(fn, *iterables, timeout = None, chunksize = 1)) maps the method and iterables together immediately and will raise an exception concurrent. futures.TimeoutError if it fails to do so within the timeout limit. If the iterables are very large, then having a chunk-size larger than 1 can improve performance when using ProcessPoolExecutor but with ThreadPoolExecutor it has no such advantage, ie it can be left to its default value.
No matter the dimensions of your matrix, this functionality will manage to deliver the complete job. As you can appreciate, the map method will receive the multiplication function and it will send the matrix as well. It should be noted that the function will be reading the rows of matrix .
def mult(filaX):
for i in range(colsB):
mat.append(filaX@b[:,i])
As we mentioned earlier, this function receives matrix rows (one row every time is being called) and it performs dot product with every column in matrix in a very pythonic way
matC = np.array(mat).reshape(rowsA, colsB)
Our answers of the dot product between rows and columns are stored in mat list and then, leveraging python capabilities, we reshape the list into a numpy array to display the results in a csv
- Clone the repo in your computer
git clone https://github.com/Los-angeles-de-Byron/custom_syscall.git
- Try the program with the following command
python matmul.py matA.csv matB.csv <threadpool_size>* <output_file>
*it should be an int value
-
Click here to open the project in replit
-
Before trying to insert the command click on the "Run" button to install necessary libraries. Don't worry if you see an error in the Console tab.
-
Change to the Shell tab and write the following command:
python main.py matA.csv matB.csv <threadpool_size>* <output_file>
*it should be an int value
- And there you go! The display must be the dimensions of each matrix and the elapsed time of the operation. You can go check your <output_file> to see the answer of the multiplication.