## FLEXIBLE ROUTER ARCHITECTURE OF NETWORK-ON-CHIPS

A

PROJECT REPORT

Submitted in partial fulfilment of the requirement for the degree

of

#### **BACHELOR OF TECHNOLOGY**

#### IN

#### **COMPUTER SCIENCE ENIGEERING**

Under the supervision

of

## **Dr. Vivek Sehgal**

by

## Anubha Chadda(151409)

to



# JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY WAKNAGHAT, SOLAN-173234

## HIMACHAL PRADESH, INDIA

## MAY, 2019

## **STUDENT'S DECLARATION**

I hereby declare that the work presented in this report entitled "Flexible Router Architecture Of Network-On-Chip" in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer Science and Engineering/Information Technology submitted in the department of Computer science & Engineering and Information Technology, Jaypee University of Information Technology Waknaghat in an authentic record of my own work carried out over a period from August, 2018 to May, 2019 under the supervision of Dr. Vivek Sehgal, Associate Professor, Computer Science & Engineering and Information Technology.

The matter embodied in the report has not been submitted for the award of any other degree or diploma.

Anubha Chadda,

151409

This is to certify that the above statement made by the candidate is true to the best of my knowledge

Dr. Vivek Sehgal

Associate Professor

Computer Science & Engineering and Information Technology

Dated: May, 2019

### ACKNOWLEDGEMENT

I wish to express my sincere gratitude to **Dr. Vivek Sehgal**, for his excellent guidance and perennial encouragement and support during the course of my work in the last one year. I truly appreciate and value his profound knowledge, esteemed supervision and encouragement from the beginning to the end of this thesis.

My special thanks are due to **Prof. Dr. Satya Prakash Ghrera**, Head of the Computer Science Engineering Department, for all the facilities provided to successfully complete this work.

I also take the opportunity to thank all my friends who have directly or indirectly helped me in my project work and in the completion of this report.

Last but not the least I would like to thank my parents, who taught me the value of hard work by their own example. I would like to share this bite of happiness with my mother and father. They rendered me enormous support during the whole tenure of my stay at JUIT, Waknaghat.

## CONTENTS

| Title Number | Description                                  | Page number |
|--------------|----------------------------------------------|-------------|
| 1.           | Introduction                                 |             |
| 1            | Abstract                                     | 1           |
| 1.1          | Problem statement                            | 5           |
| 1.2          | Objective                                    | 6           |
| 1.3          | Methodlogy                                   | 7           |
| 2.           | Literature survey                            |             |
| 2.1          | Literature review                            | 11          |
| 2.2          | Finding of literature review                 | 12          |
| 2.3          | Chapter Summery                              | 13          |
| 3.           | System design                                |             |
| 3.1          | Premise of Network-on-Chip                   | 14          |
| 3.2          | Main components of Network-on-Chip           | 14          |
| 3.2.1        | Network Switches                             | 15          |
| 3.2.2        | Resources                                    | 16          |
| 3.2.3        | Resources Network Interfaces                 | 16          |
| 3.3          | Architecture of Network-on-Chip              | 17          |
| 3.3.1        | Structure of Router in Network-on-Chip       | 17          |
| 3.3.1.1      | Input port                                   | 18          |
| 3.3.1.2      | Output port                                  | 20          |
| 3.3.1.3      | Control logic                                | 20          |
| 3.4          | Basic operation of Router                    | 21          |
| 3.5          | Network Interface in Network-on-Chip         | 21          |
| 4.           | The proposed Flexible router                 |             |
|              | Architecture                                 |             |
| 4.1          | Architecture of flexible router              | 22          |
| 4.2          | Basic operation of flexible router           | 24          |
| 4.3          | Deadlock problem                             | 25          |
| 4.4          | The flexible router is free under XY routing | 27          |

| 5.  | Algorithm                   |    |
|-----|-----------------------------|----|
| 5.1 | Random mapping algorithum   | 28 |
| 5.2 | Odd-Even routing            | 29 |
| 5.3 | XY Routing                  | 31 |
| 6.  | <b>Result analysis</b>      |    |
| 5.1 | Results                     | 34 |
| 7.  | Conclusion                  |    |
| 6.1 | Conclusion and Future Scope | 37 |

## **List of Figures**

Figure Number

Description

## Page number

| 1.1 | Moore's Law                        | 2  |
|-----|------------------------------------|----|
| 3.1 | Main components of Network-On-Chip | 14 |
| 3.2 | Router                             | 16 |
| 3.3 | Architecture of Network-On-Chip    | 18 |
| 3.4 | Input port                         | 19 |
| 3.5 | Output port                        | 20 |
| 4.1 | Flexible router architecture       | 22 |
| 4.2 | Input port                         | 24 |
| 4.3 | Deadlock between two routers       | 25 |
| 4.4 | Deadlock between four routers      | 25 |
| 5.1 | Odd-even Routing                   | 30 |
| 5.2 | Router Architecture                | 32 |
| 6.1 | Nirgam Simulator                   | 36 |

## LIST OF ACRONYM AND ABBREVATION

- ASIC -Application Specific Integrated Circuit
- DSP-Digital Signal Processor
- NoC-Network-on-Chip
- OSI-Open Source Interconnections
- SoC-System-on-Chip
- DOR- Dimension order routing

## ABSTRACT

Network on Chip (NoC) is one of the productive on-chip communication architecture for System on Chip (SoC).With the help of NoC large number of computational is done on a single chip NoC. Network-On-Chip discards the use of wire and is focused on the routing of the packets. The main focus of NoC is the reduction in the size of the chip, and making it as small as possible; an efficient algorithm can enhance the performance of NoC. In this thesis, we discussed about Base router, flexible router architecture and routing algorithm(XY and OE).All the simulation done in NIRGAM simulator.

## **Chapter 1**

## **1. INTRODUCTION**

Today, the semiconductor technology is growing at an exponential rate and this has led to dramatic increase in transistor densities. This growth in chip development can easily be explained with the help of Moore's law.

Moore successfully gave a relationship between the numbers of transistors that are present on a single chip with time.



Moore's Law – The number of transistors on integrated circuit chips (1971-2016) Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are strongly linked to Moore's law.

Fig 1.1 Moore's Law

Power consumption is increased when complex single-core processors are designed, and this gives rise to degradation in performance. Instead, computer designers, engineers, scholars and architects are trying to effectively use lakhs of transistors on a single-chip by pursuing multicore designs.

Today our processors have already amalgamated a huge number of cores; and our tech-family is entering a many core era. Although many new innovative discoveries have been made in the design and development of intricate many core processors, the designers are facing many difficulties and challenges that needs to be solved and they are mainly because of low-level logic implementations and parallel programming paradigms. Communication mechanisms affect the performance, efficiency and correctness of many-core processors.

There are many disadvantages of traditionally used on-chip bus mechanisms used for communication such as large latency, low bandwidth, poor scalability and high power consumption. The network-on-chip is a new and efficient technique which can be used to address these issues efficiently and effectively. There are many similarities in NOCs and offchip network but both their physical fabric in the power , latency and usage of area are different.

Network-On-Chip discards the use of wire and is focused on the routing of the packets. It is a typical example of layered architecture in communication network. Packets are the major source for NoC transfer data from initial-node(source) to the final-node(destination) major in form of blocks/packets. Processing elements are IP cores (Intellectual Property), processor, DSP(Digital Signal Processor),ASIC(Application Specific Integrated Circuit) etc. are the essential elements for processing these packets. The tile based architecture of NoC consists of processing element, routers and links which provide interconnection.

The other NoC applications are point-to-point signal wires, shared bus and segmented bus. The Source system-on-chip suffers from bottleneck problem and the main reason behind this problem is global interconnections. For achieving low latency, high bandwidth and scalability simultaneously, the main focus of NoC is the reduction in the size of the chip, and making it as small as possible. Bus based architecture used earlier was not a reliable architecture because it had an improper scalability and lack of parallelism, high latency low power dissipation along with low throughput. Network-On-Chip is the best solution for all the issues occurring in the traditional bus based architecture.

Today a large sum of money is being invested in the development of new techniques which can improve the efficiency, scalability and performance of the existing mesh topology.

But NoC isn't a perfect model and it doesn't solve all the problems .It has its own limitations and using parallelism or scalability at a large scale in NoC architecture doesn't always provide the best solution.

The main concerns with NoC architecture are low throughput, high intake of power, high cost communication, and high consumption of precious energy.

It is a fact that this new field of Network-on-Chip is a versatile and adaptable communication architecture designed for SoC. All the concerns with NoC completely depend upon the routing algorithm for its performance in communication applications.

Network-On-Chip is another variant of Source system-On-Chip which is implemented in the form of Network, so called as NoC, the On Chip communication in NoC will take place through packet switching. Routing Algorithm determines in which path the packet can be transmitted from source to destination, for this function routers are used between nodes in the network, this routers will direct the packets depending upon the routing algorithm implemented in the design of NoC. So best performance of the system that overcomes scalability and performance limitations can only be achieved by selecting a suitable network topology and routing algorithm and this decision is only in the hands of the designer.

#### **1.1 PROBLEM STATEMENT**

The world of communications is growing at an inexplicable rate and is moving faster and faster. This continuous growth in network system complexity and the need to reduce the design time render the development of effective and efficient optimization strategies to assist the designer. The problem statement for the research is given below:

a) Identification of the various problems in computer network that need optimization.

b) Exploring the possibilities of using intelligent approaches in the problems that need optimization.

c) Exploring the use of evolutionary algorithms in network routing and congestion control.

d) To identify the representation of the problem suitable for evolutionary approaches.

e) To compare the effectiveness of evolutionary approaches with the existing one.

#### **1.20BJECTIVE**

The future multi-core system considers the Network-on-Chip (NoC), an efficient solution for their communication demands. Overall performance of the on-chip networks depends on the effective buffer management and this management also affects the network energy consumption. The flexible router architecture is proposed to effectively utilize the available hardware resource and to increase the service equality. Today we need an architecture which can be efficiently used by all channels used to input data and for this we use flexible ring buffer architecture.

#### **1.3 METHODLOGY**

This segment examines a few common NoC structure philosophies, for example, stream control, steering, intervention, nature of administration, unwavering quality, and undertaking booking. The conduct of the structured frameworks and the computational units used to execute the framework capacities are portrayed by the Computation design.

The two models utilized for computational engineering configuration is utilizing two models—

- Behaviour Model
- Computation Architecture Model

The conduct show is apportioned and mapped to the calculation design demonstrate.

Computational engineering structure which is trailed by the NoC design, parcels a conduct model and maps it onto a calculation design show. The initial step is correspondence demonstrating and examination.

Correspondence investigation removes the correspondence examples of the application and calculation engineering.

The example incorporates correspondence types, data sizes, and correspondence frequencies among all the calculation units. Correspondence design data is utilized to coordinate the structure of convention and topology. The second step is topology and convention configuration dependent on the correspondence necessities. The point of this progression is to utilize the minimum system asset to satisfy the correspondence prerequisites appeared by the correspondence design. A developed media transmission framework structure and re-enactment condition, OPNET, is adjusted for this progression.

In the third step, the chip floor plan is evaluated to get some structure prerequisites for interconnections. This is done to evaluate the deferral on every interconnection in term of clock cycles and investigate power and territory of a NoC structure.

In NoC plan, rather than rationale circuits, interconnections rule control utilization and zone. The fourth step, execution investigation, re-enacts the NoC design with the correspondence follows from the calculation engineering in OPNET.

The execution results help to look at changed plan decisions and refine the structures. The base framework recurrence will be gotten from the execution. On the off chance that no structure meets the execution necessities or has too little execution edge, we need to return to past advances. In the fifth step, rationale configuration can help increment the precision of intensity and zone investigation, yet the interconnection configuration is the focal point of the examination.

We utilize Design Compiler to incorporate rationale circuits and utilize Cadence SPICE as the interconnection structure and recreation condition. It is likewise conceivable to mechanize the interconnection configuration by building an interconnection library, which rearranges both the structure assignment to look into a table and the floor plan estimation. The last advance is power and region investigation.

Most pessimistic scenario capacity to transmit 1-bit data is estimated for every interconnection utilizing SPICE. Most pessimistic scenario intensity of rationale circuit is evaluated dependent on the quantity of transistors. OPNET records all the movement of NoC. By summing the intensity of every movement, we get the most pessimistic scenario intensity of the NoC. Silicon and metal region utilizations are gotten by including all the structure parts.

In the event that the power or the region doesn't meet the prerequisites, we will return to pick diverse circuits or update the floor plan. In the most pessimistic scenario, we need to update the topology and convention. To test our philosophy, we structured two NoCs for a high performance implanted video framework on-chip (SoC).

## CHAPTER 2 LITERATURE SURVEY

The part depicts the writing study completed in understanding the idea of NoC and steering plans. The section portrays the point by point writing review dependent on the diverse research papers.

| S NO. | AUTHOR                                                             | YEAR | TITLE                                                                                                                         |
|-------|--------------------------------------------------------------------|------|-------------------------------------------------------------------------------------------------------------------------------|
| 1     | AndreasHansson,KeesGoossensandAndreiRadulescu                      | 2007 | "A Unified Approach to Mapping and<br>Routing on a Network-on-Chip for<br>Both Best Effort and Guaranteed<br>Service Traffic" |
| 2     | Paolo Meloni, Igor Loi,<br>Federico Angiolini, Salvatore<br>Carta, | 2007 | "Area and Power Modeling for<br>Networks-on-Chip with Layout<br>Awareness"                                                    |
| 3     | Ben Soh, HienPhan, Raghu                                           | 2010 | "A Four Stage Design Approach<br>Towards Securing a Vehicular Ad<br>Hoc Networks Architecture"                                |
| 4     | Jiang Xu, Joerg Henkel CS,<br>SrimatChakradhar                     | 2009 | "A Methodology for Design,<br>Modeling, and Analysis of Networks-<br>on-Chip<br>"                                             |
| 5     | RuchikaChandravanshi,<br>VivekTiwari                               | 2016 | "Network-On-Chip Router<br>Architecture Performance Analysis<br>by using VHDL"                                                |

## **RESEARCH PAPERS**

| 6  | BouraouiChemli, Alexandre<br>Coelho, AbdelkrimZitouni                                            | 2017 | "Design of Efficient Pipelined Router<br>Architecture for 3D Network-On-<br>Chip<br>" |
|----|--------------------------------------------------------------------------------------------------|------|---------------------------------------------------------------------------------------|
| 7  | A.AhmadiniaShahrabiAlireza                                                                       | 2011 | "A Highly Adaptive and Efficient<br>Router Architecture for Network-on-<br>Chip"      |
| 8  | SeungEun Lee and Jun<br>HoBahn                                                                   |      | "Design of An Adaptive Router<br>Architecture for Network-on-Chip"                    |
| 9  | Ran Ginosar                                                                                      |      | "Networks on Chip (NOC)"                                                              |
| 10 | J.McCanny, SakirSezer and<br>Lu                                                                  | 2011 | "Generic Low-Latency NoC Router<br>Architecture for FPGA Computing<br>SourceYstems"   |
| 11 | A.DeOrio, David Fick,<br>Valeria Bertacco,<br>D.SourceYlvester, David<br>Blaauw, Jin Hu, Gregory | 1    | "A Reliable Routing Architecture and<br>Algorithm for NoCs"                           |

#### WEBSITES AND BOOKS

| SOURCE                                                             | REMARKS |
|--------------------------------------------------------------------|---------|
| Performance Optimization in Network-On-Chip (Noc)<br>Architecture. |         |
| Testing Chips with Mesh-Based Network-On-Chip                      |         |
| Networks-on-Chips: Technology and Tools                            |         |

#### 2.1 LITERATURE REVIEW

Switches and assets which are put on the openings in a m\*n arrange frame the NoC engineering .An immediate format of 2D work of switches and assets give a physical design level plan integration. Each switch is associated with one Processing element and four neighbour routers and each change is associated with one router.

The components of switches can be a FPGA ,a processor center, memory, a custom equipment square or an IP, which fits into the available storage and consents to the interface of the NoC. The real engineering of NoC basically is the on-chip correspondence framework involving the layer identifying with information connect, physical layer, and the system layer of the Open Source system Interconnections (OSI) convention stack show.

The idea is permitted to the NoC, to suit vast assets, for example, extensive memory enlist banks, FPGA assets territories, or exceptional reason calculation assets, for example, superior and multiprocessors. NoC design can be separated dependent on system topology, stream control plans, directing approach, exchanging and the procedures connected to guarantee nature of administration for information transmission.

#### 2.2 FINDINGS OF LITERATURE REVIEW

The accompanying research papers have talked about, the pertinence and significance of NoC engineering, multistage cross bar organizes, their steering plans and testing approach .It has been recommended that a NoC structure can pursue work, tree, ring, torus or various leveled structures. The Processing Elements (PEs) of every NoC can be reconstructed dependent on diversion steering. These research papers additionally expalined the Scalable Programmable Integrated Network (SPIN), layered design of NoC on-chip small scale arrange that characterizes parcels as groupings of 32-bit words, with the bundle header fitting in the primary word.

To recognize the destination address the system use the byte from the header, which enables the system to scale up to 'N 'terminal hubs. Each stage can be made programmable and reconfigurable structures in programmable multistage . The programmable system upgrades the exchanging limit of the system and furthermore lessens the blocking likelihood. Both these favorable circumstances have been gotten as the source endorsers are customized for various conceivable course to beat the issue of clog. The system security has additionally been an issue, when the information is exchanged over long separation . Double Tone Multi Frequency (DTMF) motioning of multistage systems and Time Division Multiplexing (TDM) strategies are reviewed .The papers talked about here presents distinctive security calculation.

#### **2.3 CHAPTER SUMMARY**

The part elucidates the composing did the revelations of various research papers support to finish the investigation manage multistage crossbar NoC used for media transmission trading and reconfigurable programmable NoC. The most basic clarifications behind using NoC structures are their certification for flexibility and programmable framework limit. Media transmission development characteristics have been for a long while saw as having a vital impact in multicore structures plan. The action is passed by multistage framework and bundles are directed with most short path under most noteworthy available framework. These effects have basic consequences for the arrangement of on-chip blended media systems since self tantamount techniques have properties which are absolutely not exactly equivalent to customary short range dependent or Markovian methodology that have been generally used in structure level examinations. From the composition review, it has been in this manner point by point that even the development delivered by programmable focuses involves diverse program stages. Moreover, the examination around there is behind and lacking in light of NoC benchmarks.

## CHAPTER 3 SYSTEM DESIGN

#### 3.1 Premise to Network-On-Chip

The NOC is any communication system altogether coordinated on single chip. It is centrally IP incorporated together on single chip with clock logics which may be synchronous or asynchronous. It also work on un-clocked logics with the help of this it will enhance the communication on chip rather than what a bus or a crossbar switch will do. Through this, NoC recover the utility of SoCs. For communication among different centres in a SoC, NoC is gradually being acknowledged .With the advances in IC fabricating a steady endeavour has been to design huge measure of system on the same chip to achieve more productivity and improved chips. To expand the productivity of the systems a proficient directing calculation is done. Source system–on-Chip (SoCs) also uses topologies into account shared buses. The complex design relating with on chip communication are solved by the NoC.

#### 3.2 Main Components of Network-On-Chip

A NoC has three primary and essential segments.

- Network Switches (routers)
- Resources(IP cores)
- Resources to Network Interfaces



Figure 3.1 Main components of NOC

#### **3.2.1** Network Switches

It is a significant piece of the on chip organize like some other system foundation. They are the spine of Network on Chip. On chip arrange, the essential assignment of a switch is to transmit the approaching information to the goal IP center if the switch is straightforwardly associated with the goal asset generally that switch needs to send it to another switch. A switch execution depends on three layers of correspondence in OSI model for example Physical, Datalink and Network layer. An originator ought to think about the straightforwardness of a switch furthermore, plan it like insightful so he can dodge a few overheads like cost, territory and power. Steering capacity usage is the sole motivation behind switch for dispersed directing. For steering reason a switch may contain a directing table which is called as table based switch also, that table stores the whole course. In another manner switch actualizes directing calculation to ascertain the steering way progressively. The switch utilized for circulated steering is very complex since it needs memory and additional rationale to execute whole steering capacity. It comprises of five ports for example east, west, north, south and nearby port and a focal cross point lattice. The initial four ports are utilized to interface with different switches furthermore, the nearby port is utilized to interface the IP center. In the switch each port has an information channel also, a yield channel. The information bundles move into the information channel of a port of switch by which it is moved to the yield channel of other port. The information and yield channels have their very own interpreting rationale which improves the execution of the switch. Cushions function as impermanent capacity of information. Here the buffering strategy utilized is store and forward. Control rationale is required to settle on mediation choices. In this way, a correspondence is set up between the information and yield ports. This association or arrangement between these ports is framed by the focal cross point grid. In this exploration virtual channel switch is utilized for the NoC plan. This switch design comprises of virtual channel allocator, switch allocator, crossbar switches, course calculation unit, input channels and yield channels where input also, yield channels are multiplexed with some virtual directs .



Figure 3.2 Router

#### **3.2.2 RESOURCES**

In a tiled, city-square style of NoC design, the wires and switches are set like road frameworks of a city, while the customers (e.g., IP centers or Resources) are set on city squares isolated by wires. The IP centers or assets can be General Purpose Processors, FPGAs, Intensifiers, ADCs, DSP, memory, Graphic controller, Mixed flag Module, RF unit, application explicit equipment segment, I/O controller and so forth. Asset must have the equivalent innovation execution as that of utilized in NoC. A fashioner can utilize possess assets rather than purchasing from various sellers.

#### **3.2.3 RESOURCE NETWORK INTERFACE**

A Resource Network Interface is utilized to associate an IP center or asset to a switch in NoC.Like that IP centers can transmit message bundles to the system switch.

Asset Network Interface has two sections which are

- I) Resource Dependent part
- II) Resource Independent part

Design of Resource autonomous part is done so that Resource Network Interface goes about as another system change to the associated system switch. The technique for planning asset autonomous piece of RNI is general sort of method. For reusability perspective asset subordinate part ought to be associated with the asset having homogeneous property generally this asset dependent part will be distinctive for all assets. The Resource subordinate piece of RNI has a few functionalities like dance development (flitization), deflitization and applying encoding system. The RNI has of two autonomous layers of OSI model for example I) Session layer ii) Transport layer. Here the functionalities of these two layers are not the same as the seven layers of OSI model. According to hypothetical perspectives, the session layer sets up, facilitates and ends the conversation between the application henceforth goes about as viable mechanism for associated IP cores and the vehicle layer guarantees information exchange and works on the system interface in framework. The vehicle layer offers a correspondence administrations to the upper layer for example session layer where message fills in as a correspondence middle of the road. The session layer is worked with the assistance of administration given by the base layer for example transport layer and associated IP centres are detached from the correspondence arrange framework.

#### **3.3 ARCHITECTURE OF NOC**

A set of interconnected blocks called nodes divides the whole chip. There is a processing element (PE) in each of these nodes which can be one of the digital signal processor.

#### **3.3.1STRUCTURE OF ROUTER IN NoC**

The directing activity is done by the switches in the chip. The wires and switches make a network like structure in the chip. The switches have five input ports and five yield ports corresponding to north, south ,east, west bearings and additionally the nearby handling (processing) element(PE). The north port is utilized by the switch if the errand must be mapped along north heading .Similarly to send undertaking in the south, east and west course the relating ports of the switch are utilized and each port has two connections one is approaching connect to the port and other one is an active connection from the port. The

switch assist us with establishing the availability between each neighbouring hubs of the system.



Fig 3.3 Architecture of NoC

#### **3.3.1.1 Input port**

The input port comprises of three components

- Input controller –Upstream switch will send request (req UP) to input controller and if FIFO isn't full it will grant (gnt US) the request. It will likewise send the inner demand (req\_int) to the yield port and sit tight for their grant(gnt\_int) with the end goal to exchange the information parcel to the downstream switches as per the switch work which is utilized.
- FIFO Buffer-It is used to store the packet which is coming from upstream router.

• Routing logic -The part clarifies the writing did the discoveries of different research papers support to complete the exploration deal with multistage crossbar NoC utilized for media transmission exchanging and reconfigurable programmable NoC. The most imperative explanations behind utilizing NoC structures are their guarantee for adaptability and programmable system capacity. Media transmission movement qualities have been for quite some time perceived as having a noteworthy influence in multicore frameworks plan. The activity is passed by multistage system and parcels are steered with most brief way under greatest accessible system. These impacts have imperative ramifications for the plan of on-chip mixed media frameworks since self comparable procedures have properties which are totally not quite the same as conventional short range reliant or Markovian procedures that have been customarily utilized in framework level examinations. From the writing survey, it has been therefore detailed that even the movement produced by programmable centers comprises of different program stages. Additionally, the examination around there is behind and lacking because of NoC benchmarks.



Fig 3.4 Input Port

#### 3.3.1.2 Output port

It consist of three components

• Arbiter- Arbiter will get all the request (req\_int\_E,W,N,S,L) for the yield port and by applying routing algorithum it will grant the request of any one port. Mainly Round Robin algorithum is used because of its straight forwardness and reasonableness.

• Output Controller-It will deal with the correspondence with the downsream switch (utilizing req DS and gnt DS signals).

• MUX - It will choose which bundle goes to the downstream switch as indicated by the Arbiter .



Fig 3.5 Output Port

#### **3.3.1.3 CONTROL LOGIC**

For every datum packet, the proposed goal was dictated by the head flit. The control rationale work is to figure out which yield bearing course all the consequent dance related with this information bundle as indicated by steering calculation connected.

#### **3.3.2 BASIC OPERATION OF ROUTER**

At the point when a request (req\_US) emerges from one of the neighboring upstream switches, at that point the info controller will check the FIFO support for an accessible space. On the off chance that the FIFO is full, at that point the info controller will hold up until something like one space (one opening conveys one bundle) turns out to be free. As of now the controller will set the grant (gnt\_US) flag to the upstream switch reporting that the bundle is put away in the FIFO. In the wake of getting the parcel, the bundle holds up until it turns into the leader of the FIFO at that point the goal address field in the bundle is checked by the directing rationale capacity to decide the fitting yield port. Subsequent to deciding an appropriate yield port, the information controller starts a request (req\_int) to that particular yield port and holds on to be allowed.

Assume that the yield port has various concurrent request (req\_int\_E,W,N,S,L) from other information ports other than the request under thought; it will choose one of them as per the routing algorithum. One of the request will be selected by the arbiter and trigger the yield controller to start a request (req\_DS) to the downstream switch and wait for the grant.

#### **3.3.3 NETWORK INTERFACE IN NOC**

A system interface module change information parcel into flutters .Flits implies that the information bundle which is created from the customer rationale is to be changed over into a settled length stream control digits. The dance is separated into a header flit, a tail flutter and various body dance in the middle. Now, this cluster of dance will be directed from one switch to its neighboring switch to achieve the goal in a bounce by jump way.

#### **3.3.4 LIMITATION**

Contention Problem – The Contention problem is the main limitation of Base Router operation. It occur when the request is blocked at some input port due to requested FIFO is full. Other blocking can also occur in the network due to contention problem. It also degrade the performance of all network.

#### CHAPTER4

#### THE PROPOSED FLEXIBLE ROUTER ARCHITECTURE

#### **4.1 ARCHITECTURE OF FLEXIBLE ROUTER**

As shown in Fig. 4, some additionally functionality and module are added in the input port and all the architecture is same in base router architecture. As there is 5 input port and 5 output port, but there are some additionally functionality.



Figure 4.1 Flexible router architecture

Input port: Fig. 4.2 demonstrates the East info port for instance. It comprises of three fundamental modules:

• FIFO Flexibility Controller (FFC): To store the incoming packet the FIFO flexible controller will search the free FIFO in the router(using request and grant signal). One of the important work of FIFO flexible controller is to also communicate with output port in order to transfer the received packet to downstream router according to routing algorithum.

• FIFO buffer: The FIFO here can get bundles not just from the straightforwardly associated upstream switch as the FIFOs of the Base switch, yet additionally from other information ports (by allowing req\_FFCW,N,S\_FIFOE demands appeared in Fig.4.2starting from different FFCs in the switch).

• Routing Logic: To determine the packet direction so that it will go to the appropriate output port we apply routing algorithum.

Output port: it is exactly the same as the output port of the Base router



Figure 4.2 Input port

#### **4.2 BASIC OPERATION OF FLEXIBLE ROUTER**

The performance of base router and flexible router is same but it will be different in case of contention problem, the performance of flexible router is better than base router. Concerning, the Flexible rouetr won't hold up until the mentioned full FIFO to have at least one free openings as the Base router does, however at this point the FFC will scan for a free space in any appropriate not full FIFO in the switch (counting the FIFO of its info port; since it might have at least one free openings while looking) by mentioning (req\_FFCE\_FIFOW,N,S) the FIFOs that are not full in other information ports, and once it finds a free space it concedes back the solicitation to the upstream switch (gnt\_US). At that point the parcel is exchanged (one of the pkt\_E,W,N,S ) to the chose FIFO. From that point forward, the task of the Flexible switch will be actually equivalent to the Base switch.

#### **4.3 DEADLOCK PROBLEM**

In the event that the Flexible switch is intended to store any parcel in any support paying little respect to its heading, stop may happen. For instance, in Fig. (4.3), all parcels put away in R1 are going to East and in the meantime all bundles in R2 are going to West, henceforth a stop happens. Additionally a halt may happen between four switches as appeared in Fig. (4.4)



Fig 4.3 Deadlock between two router



Fig 4.4. Deadlock between four router

To stay away from the gridlock a few limitations must be made as far as where the switch can store the approaching bundles. These limitations are made dependent on the bearing of the approaching parcels.

A similarity was made between the supports of the Base switch and the cushions of the Flexible switch as far as what bundle bearings can each cradle have in the Base switch while working under the XY steering calculation. For instance, in the Base switch the East support can contain parcels coordinated to North, South, West, or Local. Presently as indicated by that similarity, the East cradle of the Flexible switch will be intended to acknowledge parcels coordinated to similar headings (North, South, West, or Local), and a similar idea will be connected to the remainder of the cushions.

This arrangement depends on the turn model . A few turns are counteracted to maintain a strategic distance from complete cycles and consequently to forestall gridlock. Essentially, the turn model is utilized to decide if the steering calculation under thought is sans gridlock or not. For instance, on the off chance that we apply this model to XY directing, all what we ought to do is to decide every single imaginable turn that could happen in steering. Conceivable diverts in steering are from East to North or South(turn 1 and 8) and from West to North or South(turn 6 and 3).



Figure 4.5

## 4.4 THE FLEXIBLE ROUTER IS DEADLOCK FREE UNDER XY ROUTING ALGORITHUM

By applying the turn model on the Flexible switch working under XY steering we can demonstrate that it moves toward becoming gridlock free. Similar advances will be pursued, so we will reject every conceivable turn that can happen in the Flexible switch working under XY steering calculation.

From the relationship with the Base switch under XY steering, conceivable parcel bearings that each cradle can store in the Flexible switch are as per the following:

• North buffer: Can contain parcels coordinated to Local or South. For parcels coordinated to the Local port they achieve their goal and are assimilated straightforwardly with the Local port. For parcels with South bearing this isn't a turn since it is a vertical development, accordingly bundles in the North cradle can't make turns.

• South buffer: Can contain bundles coordinated to Local or North. This is like the North cradle, and subsequently there are no diverts starting from this support.

• East buffer: Can contain parcels coordinated to Local, North, South, or West. Additionally here both Local and West don't make turns in light of the fact that the Local will be ingested and the West heading is only a level development, however for North and South developments these bearings speak to two turns; turns number 1 and 8 individually in Fig. 4.5.

• West buffer: Can contain parcels coordinated to Local, North, South, or East; utilizing a similar idea concerning the East cradle, this support contributes with turns 6 and 3 in Fig. 4.5 for headings North and South separately.

#### **CHAPTER 5**

#### ALORITHMS

Network-On-Chip is adaptable and accomplished communication architecture describe for Source System-on-Chip. The performance in communication of NoC completely depends upon the routing algorithm .The difference between Network-On-Chip and Source system-On-Chip is that the NoC is implemented in the form of Network and Network-On-Chip is another version of Source System-On-Chip. With the help of packet switching the communication is done in NoC. Routing Algorithm help us to resolve the issue that in which path the packet can be transmitted from source to destination. Between the nodes of network the function router are used, this router will direct the packets depending upon the routing algorithm implemented in the design of NoC. So it all depends on the hands of the designer to select a suitable routing algorithm and suitable network topology to achieve best performance of the system that overcome latency and performance limitation. The performance of the system all depend on the routing algorithm of the designed NoC . Latency, throughput and load distribution are important parameters to be considered while designing.

#### **5.1 Random Mapping Algorithm**

It is most commonly used algorithm by different researchers, but it has many issues. Such as latency, service time and queuing time are not handled by random algorithm for NoC . In The best case of random algorithm for mapping is that the randomly chosen cores have equal probability to be chosen, and then task is mapped on to these cores uniformly. The worst case of this algorithm is when we choose the same core for mapping the task .As due to this, all task will be mapped on the same core and the new tasks remain in the queue to be mapped and wait for infinite period of time till the core is not ready to process the new task.

Let us consider a case in which the last core of the grid is chosen to map the task as due to this the latency which is involved to map the task to the core will be very high.So mapping the task on to the cores in case of random algorithm take large amount of time and consume lot of energy. So to improve the performance various algorithms are proposed.

## **5.2 ODD-EVEN Routing (OE)**

This calculation depends on odd even turn model and it's a disseminated versatile directing calculation. To maintain a strategic distance from and keep the gridlock it applies a few confinements. Stop free directing in two-dimensional (2D) networks with no virtual channels is encouraged by Odd-even turn demonstrate.

(x,y) facilitates are utilized to recognize X\*Y hub in a two dimensional work. On the off chance that the x dimensional component is even numerical sections, the segment is called even and in the event that it is odd numbered, its segment is called odd. A 90 degree alter of voyaging course called a turn. In this calculation, an aggregate of 8 turns are available. A turn is called an ES turn on the off chance that it includes an alter of course from East to South. So also, we can characterize the other seven sorts of turns, in particular EN, WS, WN, SE, SW, NE, and NW turns, where E, W, S, and N demonstrate East, West, South, and North, individually.

There are essentially two primary hypotheses related with Odd-Even calculation.

- Hypothesis 1: No parcel is allowed to do EN turn in every hub which is situated on an even segment. Additionally, No bundle is allowed to do NW turn in every hub that is situated on an odd segment.
- Hypothesis 2: No bundle is allowed to do ES turn in every hub that is in an even section. Additionally, no parcel is allowed to do SW turn in every hub which is in an odd segment.

The accompanying test is an insignificant OE directing calculation in which available\_set\_dimension contains measurements that are accessible for sending the bundle:

/\* OE routing algorithm \*/

/\*Source router: (SourceX,SourceY);destination router: (DestinationX,DestinationY); current router: (CurrentX,CurrentY); available\_set\_dimension is an array consisting of all possible routing direction\*/

#### Begin

- 1. available\_set\_dimension is empty.
- 2. If (Ex=0 and Ey=0), for routing use local port.
- 3. If (Ex=0), it means packet reached destination column.
- 4. Check Ey if above condition is satisfied and then in available\_set\_dimension compute either SOUTH or NORTH.
- 5. If (Ex>0), Check Ey.

If Ey=0, then in available\_set\_dimension compute EAST else check for Cx is odd or Cx=Dx.

- 6. Check Ey ,if the above condition are true, and then accordingly compute SOUTH or NORTH in available\_set\_dimension.
- 7. Add EAST in available\_set\_dimension if Dx is odd or Ex!=1.
- 8. If(Ex<=0) add WEST in available\_set\_dimension and check Cx is even.
- 9. If(Ey<0), compute NORTH or SOUTH to available\_set\_dimension.

End



Fig 5.1 Odd-Even Routing

#### 4.3 XY routing

A deterministic directing algorithum utilizes a fixed steering way all through the procedure. XY routing algorithum calculation goes under deterministic routing calculation. This calculation can be executed for both for standard and unpredictable arrange topology. It is called dimension order routing (DOR). It pursues the idea of insignificant turning directing. In this directing every hub or switch of NoC is distinguished by the (x, y) co-ordinates of that hub for a 2D work. As indicated by this calculation the information bundles will cross in X-heading towards the goal segment. In the wake of finding the goal segment the information parcels will cross to the goal hub. This calculation essentially expresses that "First the information will move in X-heading and after that in Y-course". That is the reason the name of the calculation is XY Routing calculation. As indicated by this calculation the parcels can't move first in Y-heading then in X-course. So it makes them defeat or turning limitations. Due to which it moves toward becoming gridlock free.

As indicated by this calculation, (x ,y) organize of Current switch is contrasted with the (x,y) arrange of Destination switch. On the off chance that they are equivalent, this will demonstrate information parcels have achieved its goal and they will be directed to the asset through the nearby port. If they are not approach, the x-facilitate of goal is at first contrasted with the x coordinate of current switch. At the point when x-arrange of current switch is not as much as x-organize of goal switch, the information bundles will be transmitted toward the East bearing, when x coordinate of current switch is more prominent than x-facilitate of goal switch, at that point parcels will be steered to West heading. At that point check if the information parcel is as of now on the goal section or not. On the off chance that this last condition is valid, the vertical location of goal is contrasted with the vertical location of current switch is not exactly vertical location of goal switch, to North when vertical location of goal switch, to north when vertical location of goal switch.



Fig 5.2 Router architecture

/\* XY routing Algorithm \*/

/\*Source router:

Begin

(SourceX,SourceY);

destination router:

(DestinationX,DestinationY);

current router:

(CurrentX,CurrentY).\*/

begin if (DestinationX>CurrentX)

return E;

```
else if (DestinationX<CurrentY)
```

return W;

else if (DestinationX=CurrentX)

if(DestinationY<CurrentY)

return South;

else

if(DestinationY>CurrentY)

return NORTH;

else if(DestinationY=CurrentY)

return local;

End

### **CHAPTER 6**

#### RESULTS

Conventional deferral and throughput connections between's the Flexible and the Base switches under HS, UNI and NN improvement designs.

For low imbuement rates both Base and Flexible switches have ideal around an equivalent common postponement and throughput and the upside of adaptability just shows up at higher rates. This is in light of the way that at low rates the measure of bundles blended into the system per cycle is near nothing, subsequently the measure of discussion will be little furthermore, and thusly the adaptability favoured perspective won't be utilized a broad measure. At higher blend rates, the blockage enlarges all things considered the standard put off growths in like manner and the throughput diminishes.

At some point the switch submerges and the deferral winds up being high, so we consider this point the immersion point. The Flexible switch immerses at higher implantation rates than the Base switch case. For HS development there is 11.4% augmentation in the soaking rate while for UNI and NN it is a little enhancement.

We suspect that both UNI and NN are not all around enhanced for two reasons:

1. The correspondence overhead due to FFC; the upstream switch must send the enthusiasm to the FFC and the FFC by turn will look for the free opening and give the offer back to the upstream switch. Regardless, if the upstream switch could search for without any other person's info expressly, this correspondence overhead could be lessened and the execution could be progressed.

2. The second reason is the b2b stop issue which causes some block since when it happens; the bundle puts in a few cycles with no respect then another intrigue is started to another FIFO. This may affect particular bundles behind this package what's more, results in more deferral in different packages effectively as the blockage in light of blocking does.Out of order packets analysis.

We have contemplated this symptom to realize the amount it influences the Flexible switch activity. The HS movement design was utilized in light of the fact that for HS the likelihood of having two progressive bundles created from a similar source and setting off to the same goal is high. This examination is made under the immersion infusion rate and the aggregate number of sent bundles is 25 000. In this analysis, distinctive slacking separations (the distinction in position between the right and the inaccurate request of the parcels) are resolved alongside the quantity of out of request bundles came to with each lagging. Most of the out of request bundles (402) came to with slacking 1 and only 10 parcels came to with slacking 2 while there is only one bundle come to with most extreme slacking 3. Absolutely, there are 413 out of request parcels out of 25000 the aggregate number of sent bundles, which implies simply 1.65%. Additionally the greater part of them reach with the base slacking 1, while there is only one bundle comes to with most extreme slacking 3. As per such outcomes we can reason that this symptom is restricted and can be effortlessly adjusted at the goal hub.

| % of | Average Latency |         |
|------|-----------------|---------|
| Load | XY              | OE      |
|      |                 |         |
| 10   | 51.8059         | 59.1711 |
| 30   | 52.7364         | 61.3614 |
| 50   | 56.4783         | 63.661  |
| 70   | 66.4793         | 63.923  |
| 90   | 71.3295         | 66.417  |
| 100  | 75.1785         | 68.915  |

| % of | Average Throughput(Gbps) |        |
|------|--------------------------|--------|
| Load | XY                       | OE     |
|      |                          |        |
| 10   | 2.0193                   | 4.587  |
| 20   | 3.0875                   | 6.548  |
| 30   | 5.874                    | 12.581 |
| 50   | 5.214                    | 14.874 |
| 90   | 6.987                    | 17.957 |
| 100  | 8.154                    | 23.956 |

| File Edit View Terminal Help                                                                                                          |
|---------------------------------------------------------------------------------------------------------------------------------------|
| tile 0 core received 0 packets                                                                                                        |
| tile 1 core received 0 packets                                                                                                        |
| tile 2 core received 0 packets                                                                                                        |
| tile 3 core received 0 packets                                                                                                        |
| tile 4 core received 0 packets                                                                                                        |
| tile 5 core received 1 packets                                                                                                        |
| tile 6 core received 1 packets                                                                                                        |
| tile 7 core received 0 packets                                                                                                        |
| tile 8 core received 0 packets                                                                                                        |
| tile 9 core received 0 packets                                                                                                        |
| tile 10 core received 0 packets                                                                                                       |
| tile 11 core received 0 packets                                                                                                       |
| SystemC: simulation stopped by user.                                                                                                  |
|                                                                                                                                       |
| Simulation complete!                                                                                                                  |
| Event Log created in nirgam/log/nirgam/event.log                                                                                      |
| Simulation results and gnuplot graphs stored in nirgam/results/str_concat<br>Input data files for matlab created in nirgam/log/matlab |
| Run "plot graphs" from directory nirgam/matlab to generate graphs in matlab<br>Exit NIRGAM!                                           |
|                                                                                                                                       |
|                                                                                                                                       |
| Info: (I804) /IEEE_Std_1666/deprecated: You can turn off warnings about                                                               |
| IEEE 1666 deprecated features by placing this method call as the                                                                      |
| first statement in your sc_main() function:                                                                                           |
| <pre>sc report handler::set actions("/IEEE Std 1666/deprecated", SC D0 NOTHING);</pre>                                                |
|                                                                                                                                       |
|                                                                                                                                       |

Figure 6.1 Nirgam simulator

### **CHAPTER 7**

#### Conclusion

#### **6.1 Conclusion and Future scope**

The mapping algorithms are need to be mapped in such a way or in the most suitable cores such that the latency ,service time ,queuing time and the energy consumption should be minimized. There are different mapping algorithms provided by different researchers they have their own different parameters. So our main objective is to propose such a mapping algorithm which is best suitable in terms of latency, service time, queuing time and energy consumption. Along with this the mapping algorithm should map the tasks onto the suitable core and there should be no problem of deadlock. All the algorithm are being discussed and they are implemented in 2-D mesh topology.

Diverse sort of system condition characterizes distinctive kind of XY steering, and the determination of XY directing algorithm relies on the application and the movement of the parcel in the system.

As effortlessness in usage is imperative in all design so a XY steering calculation generally utilized. For less crashes Intermittent Routing Algorithm is favoured. The Destination XY directing calculation accomplishes better equalization in load appropriation and give stop free and without livelock office. Were we can't traded off with precision of got information , we go for the blame tolerant steering like XYX .If application is centered around system assets usage the Adaptive XY directing calculation is best decision. At long last we can state that decision of XY directing calculation is absolutely relies on natural state of NOC design.

As a future work, we plan to more enhance the Flexible Router configuration to enhance the execution Also we will assess the execution of the adaptable switch under various directing calculation.

#### References

[1] Elhaji, Majdi, et al. "SourceYstem level modeling methodology of NoC design from UML-MARTE to VHDL." Design Automation for Embedded SourceYstems 16.4 (2012): 161-187.

[2] Chemli, Bouraoui, and AbdelkrimZitouni. "Design and evaluation of optimized router pipeline stages for Network-On-Chip." Image Processing, Applications and SourceYstems (IPAS), 2016 International. IEEE, 2016.

[3] Tatas, K., Siozios, K., Soudris, D., &Jantsch, A. (2014). Designing 2D and 3D networkon-chip architectures.Springer.

[4] Chemli, Bouraoui, and AbdelkrimZitouni. "Low Cost Network-On-Chip Router Design for Torus Topology." IJCSNS 17.5 (2017): 287.

[5] Azarkhish, Erfan, Igor Loi, and Luca Benini. "A case for threedimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects." IET Computers & Digital Techniques 7, no. 5 (2013): 191-199.

[6] Sheibanyrad, Abbas, FrédéricPetrot, and Axel Jantsch. 3D integration for NoC-based SoC Architectures.Springer, 2011.

[7] Pasricha, Sudeep, and Yong Zou. "A low overhead fault tolerant routing scheme for 3D Networks-on-Chip."In Quality Electronic Design (ISQED), 2011 12th International SourceYmposium on, pp. 1-8.IEEE, 2011.

[8] Akbari, Sara, Ali Shafiee, Mahmoud Fathy, and Reza Berangi. "AFRA: A low cost high performance reliable routing for 3D mesh NoCs." In 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 332-337.IEEE, 2012.

[9] Jain, Arpit, RakeshDwivedi, Adesh Kumar, and Sanjeev Sharma. "Scalable Design and SourceYnthesis of 3D Mesh Network-On-Chip." In Proceeding of International Conference on Intelligent Communication, Control and Devices, pp. 661-666. Springer Singapore, 2017.

[10] Khan, Mohammad Yahiya, SapnaTyagi, and Mohammad Ayoub Khan. "Tree-Based 3-D Topology for Network-On-Chip." World Applied Sciences Journal 30, no. 7 (2014): 844-851.

[11] Agyeman, Michael Opoku, Ali Ahmadinia, and AlirezaShahrabi. "Efficient routing techniques in heterogeneous 3d networks-on-chip." Parallel Computing 39, no. 9 (2013): 389-407.

[12] Salah, Yahia, Yahia Said, Mohsen Ben Jemaa, Salah Dhahri, and Mohamed Atri. "Cost/performance evaluation for a 3D symmetric NoC router." In Image Processing, Applications and SourceYstems Conference (IPAS), 2014 First International, pp. 1-7. IEEE, 2014. [13] Chemli, Bouraoui, and AbdelkrimZitouni. "A Turn Model Based Router Design for 3D Network-On-Chip." World Applied Sciences Journal 32, no. 8 (2014): 1499-1505.

[14] Chemli, Bouraoui, and AbdelkrimZitouni. "Design of a Network-On-Chip router based on turn model." In Sciences and Techniques of Automatic Control and Computer Engineering (STA), 2015 16th International Conference on, pp. 85-88. IEEE, 2015.

[15] Yu, Xiao, Li Li, Yuang Zhang, Hongbing Pan, and Shuzhuan He. "Mass message transmission aware buffer-less packet-circuit switching router for 3D NoC." In 2013 10th IEEE International Conference on Control and Automation (ICCA), pp. 983-986.IEEE, 2013.
[16] Alhussien, Abdulaziz, Chifeng Wang, and Nader Bagherzadeh. "Design and evaluation of a high throughput robust router for network-on-chip." IET computers & digital techniques 6, no. 3 (2012): 173-179.

[17]Moore's law figure 1.1 from Wikipedia.

[18] Bahmani, Maryam, Abbas Sheibanyrad, Frederic Petrot, Florentine Dubois, and Paolo Durante. "A 3D-NoC router implementation exploiting vertically-partially-connected topologies." In 2012 IEEE Computer Society Annual SourceYmposium on VLSI, pp. 9-14. IEEE, 2012.