

# بسم الله الرحمن الرحيم



-C-02-50-2-





شبكة المعلومات الجامعية التوثيق الالكتروني والميكرونيلم





# جامعة عين شمس

التوثيق الإلكتروني والميكروفيلم

## قسم

نقسم بالله العظيم أن المادة التي تم توثيقها وتسجيلها علي هذه الأقراص المدمجة قد أعدت دون أية تغيرات



يجب أن

تحفظ هذه الأقراص المدمجة يعيدا عن الغيار













بالرسالة صفحات لم ترد بالأصل



# Memory/Bus Arbitration in Multiple-Bus Multiprocessor Systems

Thesis submitted in accordance with the requirements of the Department of Mathematics Faculty of Science, Minufiya University

for M. Sc. degree in Computer Science

#### BY Sherif Said El-Etrepy

Demonstrator, Dept. of Mathematics. (Pure Mathematics & Computer Science), Faculty of Science, Minufiya University

#### SUPERVISORS

Prof. Dr. Fawzy Ali Torkey
Dept. of Computer Science & Engineering
Faculty of Electronic Engineering,
Minufiya University

10rkeel

Dr. Ahmed Hassan Ali Dept. of Mathematics, Faculty of Science Minufiya University

Dr. Abd EL-Aziz EL-Sherbiny
Dept. of Mathematics, Faculty of Science,

Minufiya University

A Elshedins

وغير المؤالة التعالي المؤالة التعالي المؤالة ا



﴿وقل رب ذدنی علما﴾



صدقاللهالعظيم



To my lovely
My lifeloug
My father, my mother
My brothers, my sisters

AND TO MY WIFE

Who has spent two years from her life to stand beside me,

I APPRECIATE HER KINDNESS









### ACKNOWLEDGMENT



First of all, praise and thanks to God for every thing accursed or to be accursed in my life.

To all who helped me directly or indirectly in bringing this thesis to light, I send my great appreciation and gratitude to all of them; with some special regards to:

**Prof. Dr. Fawzy Ali Torkey**, Professor of Computer Engineering, Department of Computer Science and Engineering, Faculty of Electronic Engineering, Minufiya University, who taught, helped and encouraged me a lot through out the days of work on the thesis, until I reached the desired standard. I cannot fulfill him his true rewards.

At the same time, I would like to present my gratitude to *Prof. Dr.* Ahmed Hassan Ali, Associate Professor of Department of Mathematics, Faculty of Science, Minufiya University, who gave me a lot of his experience, time, and support for the sake of the thesis.

I would like to express my deep appreciation to **Dr. Abd EL-Aziz EL-Sherbiny,** Department of Mathematics, Faculty of Science, Minufiya University. Who reviled every obstacle and gave me all the support that he can do to his little brother. My best wishes for him.

I want to thank my colleges Ahmed Zaid, Passent, Said, Hany, Ahmed Ghoneim and Moustafa for their help they offered. Finally, to all Professors, Lectures, and demonstrators in the Department of Mathematics and Computer Science, Faculty of Science, Minufiya University, my great appreciation and gratitude for the encouragement and help they all gave me.



#### Memory/Bus Arbitration in Multiple-Bus Multiprocessor Systems

M.Sc. Thesis by

Sherif Said EL-Etrepy,

Dept. of Mathematics and Computer Science, Faculty of Science, The University of Minufiya

#### ABSTRACT

performance and behavior of tightly-coupled multiple-bus multiprocessor systems are modeled, developed and studied in this thesis. These systems consist of P processors and M memory modules. Each processor may have its own private cache and local I/O. Processors and memory modules are interconnected via B global buses, where each bus is connected to all processors and to all memory modules. Processors communicate through the shared memory modules. Such flexibility to access shared memory causing memory access conflicts that tend to degrade system performance. In these systems, processors may also contend for the path to the memory module. A theoretical stochastic model is developed for predicting the system performance in the presence of such memory/bus conflicts. Hardware arbiters are employed to resolve memory access conflicts and to allocate the available buses. The blocked requests due to memory/bus arbitration are not ignored but resubmitted during the next memory cycle. The performance of the modeled system is, generally, measured in terms of some popular criteria, such as effective memory bandwidth and acceptance probability. Many interconnection networks have been proposed for interconnecting processors and memory modules. These range from a singlebus to a fully connected crossbar. The simplest multiprocessor system and easiest to modify is one in which all processors are connected in parallel to a single-bus and all share this communication path to the memory modules. In this type of communication, arbitration hardware must be provided to

synchronize access to these memory modules by the different processors. Finally, simultaneous requests for the bus must be executed sequentially. The this type of communication in the beginning. The thesis considers performance is evaluated in terms of the effective memory bandwidth and the acceptance probability. Simulation studies are developed especially when the amount of processor-memory traffic is large. The case studies have shown that the bus is actually the performance-limiting factor in single-bus architectures for parallel processing. When the number of buses increased to approach the value  $B=\min(P,M)$ , the resulting special case is the crossbar network which is suffering only from memory conflicts. Simulation studies are presented in the thesis, and the results reveal that crossbars provide a maximum degree of performance. However, they scale up linearly in terms of performance, but at the expense of an  $O(P^2)$  complexity for a  $P \times P$  crossbar. The general multiplebus systems, when  $B \le \min(P, M)$ , are considered and their performance evaluated. Such general systems are suffering from both memory conflicts and bus arbitration. They strike a compromise between the price/performance alternatives offered by crossbars. Two models are considered. They are the even priority model and the prioritized model. In the even priority model, all processors are assumed to have the same priority level and the memory requests from each processor are assumed to be random, independent and uniformaly distributed over all the memory modules.

Prioritized multiple-bus multiprocessor systems are finally considered, in which each processor is assigned a priority level. In such systems, each processor will have different acceptance probability depending on its priority. Thus, the performance of each processor can be individually evaluated. The number of global buses can always play an effective part on the system performance as indicated from the results.

### TABLE OF CONTENTS

| CONTENTS                                 | Pag     |
|------------------------------------------|---------|
| TITLE PAGE                               | No<br>I |
| DEDICATION                               | II      |
| ACKNOWLEDGMENT                           | III     |
| ABSTRACT                                 | IV      |
| TABLE OF CONTENTS                        | V       |
| LIST OF FIGURES                          | XII     |
| ABBREVIATIONS                            | XI      |
| NOMENCLATURE                             | XII     |
|                                          | 7111    |
| CHAPTER:                                 |         |
| 1. INTRODUCTION                          | 1       |
| 1.1 Overview and Criteria                | 1       |
| 1.2 Previous Work                        | 6       |
| 1.2 Tievious Work                        | U       |
| 2. SINGLE-BUS INTERCONNECTION NETWORKS   | 1.0     |
|                                          |         |
| 2.1 Preliminaries                        | 18      |
| 2.2 Criteria                             | 21      |
| 2.3 Arbitration Algorithms               | 23      |
| 2.4 Memory/Bus Arbitration               | 26      |
| 2.4.1 1-of- <i>P</i> Memory Arbiter      | 27      |
| 2.4.2 <i>B</i> -of- <i>M</i> Bus Arbiter | 28      |
| 2.5 System Architecture                  | 30      |
| 2.6 Analysis and Results                 | 31      |
| 2.6.1 Without Resubmission of            |         |
| Blocked Requests                         | 31      |
| 2.6.2 Resubmissiom of Blocked            |         |
| Requests                                 | 43      |
| 3. CROSSBAR INTERCONNECTION NETWORKS     | 54      |
| 3.1 Preliminaries                        | 54      |
| 3.2 Analysis                             | 56      |
| 3.3 Performance                          | 65      |

|      |           | 3.3.1 Effect of Memory Modules                 | 65  |
|------|-----------|------------------------------------------------|-----|
|      |           | 3.3.2 Effect of Processors                     | 68  |
|      |           | 3.3.3 Effect of Static Request Rate            | 71  |
|      |           |                                                |     |
| 4.   | EVEN PI   | RIORITY SYSTEMS                                | 74  |
|      | 4.1       | Preliminaries                                  | 74  |
|      | 4.2       | Architecture                                   | 76  |
|      | 4.3       | Stochastic Model                               | 80  |
|      |           | 4.3.1 Without Resubmission of Blocked Requests | 80  |
|      | ×         | 4.3.2 With Resubmission of Blocked Requests    | 92  |
| 5.   | DIFFER    | ENT PRIORITY SYSTEMS                           | 110 |
|      | 5.1       | A Prioritized Model                            | 110 |
|      |           | 5.1.1 Assumptions                              | 111 |
|      |           | 5.1.2 Analytical Model                         | 112 |
|      | 5.2       | Probability of Acceptance                      | 113 |
|      |           | 5.2.1 Effect of New Requests                   | 116 |
|      |           | 5.2.2 Effect of Resubmitted                    |     |
|      |           | Requests                                       | 119 |
|      | 5.3       | Performance Results                            | 120 |
| 6.   | SUMMA     | RY AND CONCLUSION                              | 129 |
|      |           |                                                |     |
| REFE | RENCES    |                                                | 132 |
| APPE | NDIX A    |                                                |     |
| ARAB | BIC SUMMA | ARY                                            |     |

### LIST OF FIGURES

|        | Figures                                              | Page |
|--------|------------------------------------------------------|------|
| 1.1    | A multiprocessor system                              | 3    |
| 2.1    | A single-bus multiprocessor system                   | 19   |
| 2.2    | Bus control                                          | 23   |
| 2.3    | Rotating daisy chain implementation                  | 26   |
| 2.4    | 1-of-8 arbiter constructed from a tree of 1-of-2     |      |
|        | arbiters                                             | 28   |
| 2.5    | Iterative design for a <i>B-of-M</i> arbiter         | 29   |
| 2.6    | A single-bus multiprocessor architecture with        |      |
|        | memory/bus arbiter                                   | 30   |
| 2.7    | Blocked requests                                     | 36   |
| 2.8    | Algorithm to compute the performance of single-bus   |      |
|        | model                                                | 39   |
| 2.9    | Effect of request rate on the blocked requests in a  |      |
|        | system with $P=64$ , and $M=16$                      | 40   |
| 2.10   | Acceptance probability for different numbers of      |      |
|        | memory modules for a system with $P=64$ . $?$        | 41   |
| 2.11   | Effect of number of processors on the acceptance     |      |
|        | probability in a system with P=64, M=32              | 42   |
| .12(a) | Arbitraion process??                                 | 43   |
| .12(b) | Markov graph                                         | 44   |
| 2.13   | Algorithm to compute the performance of the single-  |      |
|        | bus model with resubmissiom of blocked requests      | 47   |
| 2.14   | Effect of number of memory modules on the            |      |
|        | acceptance probability in system with $P=16$ with    |      |
|        | different values of static request rate              | 48   |
| 2.15   | Effect of number of processors on the acceptance     |      |
|        | probability in system with $M=12$ with different     |      |
|        | values of static request rate                        | 49   |
| 2.16   | Effect of number of memory modules on the idle       |      |
|        | cycle in system with $P=12$ with different values of |      |
|        | static request rate                                  | 50   |

| 2.17       | Effect of number of processors on the idle cycle in                |   |  |  |
|------------|--------------------------------------------------------------------|---|--|--|
|            | system with $M=12$ with different values of static                 |   |  |  |
|            | request rate                                                       | 4 |  |  |
| 2.18       | Effect of number of memory modules on the                          |   |  |  |
|            | dynamic requests in system with $P=16$ with different              |   |  |  |
|            | values of static request rate                                      | 4 |  |  |
| 2.19       | Effect of number of processors on the dynamic                      |   |  |  |
|            | request rate in system with $M=12$ for different values            |   |  |  |
|            | of static request rate                                             |   |  |  |
| 3.1        | (a) Request at $M_i$ in $N \times N$ crossbar                      | 4 |  |  |
|            | (b) Request at $M_i$ in $P \times M$ crossbar with $P \ge M \dots$ | 4 |  |  |
|            | (c) Request at memories in an $P \times M$ crossbar with           |   |  |  |
|            | $M \ge P$                                                          |   |  |  |
| 3.2        | Crossbar interconnection network                                   |   |  |  |
| 3.3        | Arbitration process for crossbar architecture                      |   |  |  |
| 3.4        | Algorithm to compute the performance of a crossbar                 |   |  |  |
|            | model                                                              |   |  |  |
| 3.5        | Effect of number of memory modules on acceptance                   |   |  |  |
|            | probability with different values of static request rate           |   |  |  |
|            | in a crossbar system with $P=16$                                   | ( |  |  |
| 3.6        | Effect of number of memory modules on idle cycles                  |   |  |  |
|            | with different values of static request rate in a                  |   |  |  |
|            | crossbar system with P=16                                          | ( |  |  |
| <b>3.7</b> | Effect of number of memory modules on dynamic                      |   |  |  |
|            | request rate with different values of static request               |   |  |  |
|            | rate in a crossbar system with $P=16$                              |   |  |  |
| 3.8        | Effect of number of processors on the acceptance                   |   |  |  |
|            | probability with different values of static request rate           |   |  |  |
|            | in a crossbar system with M=16                                     | ( |  |  |
| 3.9        | Effect of number of processors on idle cycle with                  |   |  |  |
|            | different values of static request rate in a crossbar              |   |  |  |
|            | system with <i>M</i> =16                                           | ( |  |  |
| 3.10       | Effect of number of processors on dynamic request                  |   |  |  |
|            | rate with different values of static request rate in a             |   |  |  |
|            | crossbar system with M=16                                          | , |  |  |