Date of Award

Fall 12-2021

Level of Access Assigned by Author

Open-Access Thesis

Degree Name

Master of Science (MS)


Computer Engineering


Bruce Segee

Second Committee Member

Vince Weaver

Third Committee Member

Chaofan Chen


Machine learning is a rapidly growing field that has become more common of late. Because of the demanding computational usage of machine learning, this field has many dimensions needing research. TensorFlow has been developed to deal with and analyze neural networks computation. In particular, TensorFlow is often used in one of the machine learning branches and is called deep learning. This work discusses the performance of a deep learning model to train a very large dataset with TensorFlow. It compares performance when the run happens on CPUs and on GPUs regarding the run time and speed. The run time is an important factor for deep learning projects. The goal is to find the most efficient machine and platform to run the neural networks computation. TensorFlow provides all the resources and operations that are needed to process the neural networks computations. TensorFlow has two versions 1.0 and 2.0. This work uses TensorFlow 2.0 which is easier to code, faster to build the models, and faster for training time. Also, TensorFlow 2.0 has the methods used to distribute the run on multi-CPUs and multi-GPUs which use the strategy scope to run the model in parallel. The hardware utilized for this work consist of two clusters referred to as BlueMoon CPU and DeepGreen GPU. These clusters were developed by the University of Vermont for artificial intelligence research. The results show the performance of running the model for training a large dataset that becomes better each time the number of processors increases. The speedup is the highest when training a large batch size of samples with a higher number of processors. When more devices (GPU) have been added to these processors to run the model the performance becomes faster especially for the larger batch sizes. When the model runs on GPUs it requires a CPU to make the computation processing complete. Reducing the CPU number that is used to distribute the data on multi- GPUs makes the speedup higher than using more CPUs that are responsible to distribute the data on the same number of GPUs. The results contain the comparison run time of the classification model for the same dataset when run on a different machine with or without accelerator and using two different TensorFlow versions. The comparison results show using the TensorFlow 2.0 is better than when using the TensorFlow 1.0. Running the model with accelerator achieves more speedup and running the model on the clusters with both TensorFlow versions obtains higher performance.