Computer organization architecture pdf
ROM is a non volatile memory meaning information stored in ROM remains in the memory even when there is no power or when the computer is turned off. RAM is volatile memory meaning the information is lost when it loses power. However, RAM memory can ve accessed very fast in any random order.
It is vert expensive. Average rating 4. Vote count: No votes so far! Be the first to rate this post. Tags computer architecture notes computer architecture notes pdf free download computer organization and architecture computer organization and architecture notes pdf computer organization and architecture pdf. Save my name, email, and website in this browser for the next time I comment.
Assembly language A Uses alphabetic codes in place of binary numbers used in machine language B Is the easiest language to write programs C Need not be translated into machine language D None of these Show Answer Answer: A Uses alphabetic codes in place of binary numbers used in machine language.
Computer Organization and Architecture pdf — Set Computer Organization multiple Choice Questions — Set Computer organization refers to the operational units and their interconnections that realize the architectural specifications.
Examples of architectural attributes include the instruction set, the number of bits used to represent various data types e. Organizational attributes include those hardware details transparent to the programmer, such as control signals; interfaces between the computer and peripherals; and the memory technology used. Computer function refers to the operation of each individual component as part of the structure.
Main memory: Stores data. A common example of system interconnection is by means of a system bus, consisting of a number of conducting wires to which all the other components attach.
The computer gets its instructions by reading them from memory, and a program can be set or altered by setting the values of a portion of memory. Thus, a program that executes on one machine will also execute on any other.
Similar or identical operating system: The same basic operating system is available for all family members. Increasing speed: The rate of instruction execution increases in going from lower to higher family members. Increasing memory size: In going from lower to higher family members.
Increasing cost: In going from lower to higher family members. The vectors A, B, and C are each stored in 1, contiguous locations in memory, beginning at locations , , and , respectively. The program begins with the left half of location 3. A counting variable N is set to and decremented after each step until it reaches —1. Thus, the vectors are processed from high location to low location. Opcode Operand b. First, the CPU must make access memory to fetch the instruction.
The instruction contains the address of the data we want to load. During the execute phase accesses memory to load the data value located at that address for a total of two trips to memory. The CPU then asserts the Read control line to memory and places the address on the address bus. Memory places the contents of the memory location passed on the data bus. This data is then transferred to the MBR. The CPU then asserts the Write control line to memory and places the address on the address bus and the data on the data bus.
Memory transfers the data on the data bus into the corresponding memory location. When an address is presented to a memory module, there is some time delay before the read or write operation can be performed. While this is happening, an address can be presented to the other module. For a series of requests for successive words, the maximum rate is doubled. A system is only as fast as its slowest link. In recent years, the bottlenecks have been the performance of memory modules and bus speed.
Different systems are not comparable on clock speed. Other factors such as the system components memory, buses, architecture and the instruction sets must also be taken into account. A more accurate measure is to run both systems on a benchmark. Benchmark programs exist for certain tasks, such as running office applications, performing floating-point operations, graphics operations, and so on.
The systems can be compared to each other on how long they take to complete these tasks. According to Apple Computer, the G4 is comparable or better than a higher-clock speed Pentium on many benchmarks. If we could have an arbitrary number of these tubes ON at the same time, then those same tubes could be treated as binary bits. With ten bits, we can represent patterns, or patterns. For integers, these patterns could be used to represent the numbers from 0 through Recall that the larger the ratio, the higher the speed.
Based on a R is the slowest machine, by a significant amount. Based on b , M is the slowest machine, by a modest amount. Similarly machine Z is half as fast as X for benchmark 1, but twice as fast for benchmark 2. Intuitively, these three machines have equivalent performance.
Clearly, the arithmetic mean is worthless in this context. When the geometric mean is used, the three machines are shown to have equal performance when normalized to X, and also equal performance when normalized to Y.
These results are much more in line with our intuition. Assuming the same instruction mix means that the additional instructions for each task should be allocated proportionally among the instruction types. The CPI has increased due to the increased time for memory access.
There is a corresponding drop in the MIPS rate. The speedup factor is the ratio of the execution times. Using Equation 2. The answer to this question depends on how we interpret Amdahl's' law. There are two inefficiencies in the parallel system. First, there are additional instructions added to coordinate between threads.
Second, there is contention for memory access. The way that the problem is stated, none of the code is inherently serial. All of it is parallelizable, but with scheduling overhead. One could argue that the memory access conflict means that to some extent memory reference instructions are not parallelizable.
But based on the information given, it is not clear how to quantify this effect in Amdahl's equation. Data processing: The processor may perform some arithmetic or logic operation on data.
Control: An instruction may specify that the sequence of execution be altered. Instruction fetch if : Read instruction from its memory location into the processor. Instruction operation decoding iod : Analyze instruction to determine type of operation to be performed and operand s to be used.
Data operation do : Perform the operation indicated in the instruction. Processor to memory: The processor writes a unit of data to memory. This 1 reduces propagation delay, because each bus can be shorter, and 2 reduces bottleneck effects. Address and data pins: Include 32 lines that are time multiplexed for addresses and data.
Interface control pins: Control the timing of transactions and provide coordination among initiators and targets. Arbitration pins: Unlike the other PCI signal lines, these are not shared lines. Cache support pins: These pins are needed to support a memory on PCI that can be cached in the processor or another device.
The PC contains , the address of the first instruction. This value is loaded in to the MAR. The value in location which is the instruction with the value in hexadecimal is loaded into the MBR, and the PC is incremented. These two steps can be done in parallel. The value in location is loaded into the MBR. The value in location which is the instruction with the value is loaded into the MBR, and the PC is incremented.
The value in the MBR is stored in location However, because the data bus is only 16 bits, it will require 2 cycles to fetch a bit instruction or operand. Thus a more complex memory interface control is needed to latch the first part of the address and then the second part because the microprocessor will end in two steps.
The program counter must be at least 24 bits. Typically, a bit microprocessor will have a bit external address bus and a bit program counter, unless on- chip segment registers are used that may work with a smaller program counter. If the instruction register is to contain the whole instruction, it will have to be bits long; if it will contain only the op code called the op code register then it will have to be 8 bits long.
The Teletype sets FGI to 1 after the word is printed. The process described in a is very wasteful. If interrupts are used, the Teletype can issue an interrupt to the CPU whenever it is ready to accept or send data. During a single bus cycle, the 8-bit microprocessor transfers one byte while the bit microprocessor transfers two bytes.
The bit microprocessor has twice the data transfer rate. Suppose we do transfers of operands and instructions, of which 50 are one byte long and 50 are two bytes long. Thus, the data transfer rates differ by a factor of 1. This requires that the priority signal propagate the length of the daisy chain Figure 3. Thus, the maximum number of masters is determined by dividing the amount of time it takes a bus master to pass through the bus priority by the clock period.
This device must defer to all the others. However, it may transmit in any slot not reserved by the other SBI devices. This gives it the lowest average wait time under most circumstances. Only when there is heavy demand on the bus, which means that most of the time there is at least one pending request, will the priority 16 device not have the lowest average wait time.
The length of the memory read cycle is ns. The Read signal begins to fall at 75 ns from the beginning of the third clock cycle middle of the second half of T3. Thus, memory must place the data on the bus no later than 55 ns from the beginning of T3.
The clock period is ns. Therefore, two clock cycles need to be inserted. From Figure 3. To insert two clock cycles, the Ready line can be put in low at the beginning of T2 and kept low for ns. A 5 MHz clock corresponds to a clock period of ns.
Therefore, the Write signal has a duration of ns. One wait state. Without the wait states, the instruction takes 16 bus clock cycles. The instruction requires four memory accesses, resulting in 8 wait states. The wait state extends the bus read cycle by ns, for a total duration of 0. On average, they consist of 20 bit items, 40 bit items, and 40 bytes. For the bit microprocessor, the number required is Thus, the Interrupt Acknowledge will start after ns.
Access must be made in a specific linear sequence. Direct access: Individual blocks or records have a unique address based on physical location.
Access is accomplished by direct access to reach a general vicinity plus sequential searching, counting, or waiting to reach the final location. Random access: Each addressable location in memory has a unique, physically wired-in addressing mechanism. The time to access a given location is independent of the sequence of prior accesses and is constant.
Because memory references tend to cluster, the data in the higher- level memory need not change very often to satisfy memory access requests.
Associative mapping permits each main memory block to be loaded into any line of the cache. In set-associative mapping, the cache is divided into a number of sets of cache lines; each main memory block can be mapped into any line in a particular set. The remaining two fields specify one of the blocks of main memory. These two fields are a line field, which identifies one of the lines of the cache, and a tag field, which identifies one of the blocks that can fit into that line.
A word field identifies a unique word or byte within a block of main memory. These two fields are a set field, which identifies one of the sets of the cache, and a tag field, which identifies one of the blocks that can fit into that set. Temporal locality refers to the tendency for a processor to access memory locations that have been used recently. Temporal locality is exploited by keeping recently used instruction and data values in cache memory and by exploiting a cache hierarchy.
Therefore, 4 bits are needed to identify the set number. Therefore, the set plus tag lengths must be 12 bits and therefore the tag length is 8 bits. Each block contains words. Therefore, 7 bits are needed to specify the word. Thus the cache consists of sets of 2 lines each. Therefore 8 bits are needed to identify the set number. For the Mbyte main memory, a bit address is needed.
Therefore, the set plus tag lengths must be 22 bits, so the tag length is 14 bits and the word field length is 4 bits. Address length: 24; number of addressable units: ; block size: 4; number of blocks in main memory: ; number of lines in cache: ; size of tag: 8. Address length: 24; number of addressable units: ; block size: 4; number of blocks in main memory: ; number of lines in cache: hex; size of tag: Address length: 24; number of addressable units: ; block size: 4; number of blocks in main memory: ; number of lines in set: 2; number of sets: ; number of lines in cache: ; size of tag: 9.
Each set in the cache includes 3 LRU bits and four lines. Each line consists of 4 bit words, a valid bit, and a bit tag. Bytes with addresses through are stored in the cache d. Because two items with two different memory addresses can be stored in the same place in the cache.
The tag is used to distinguish between them. The bits are set according to the following rules with each access to the set: 1. The replacement algorithm works as follows Figure 4. Then the cache will determine which of the pair of blocks was least recently used and mark it for replacement.
When the cache is initialized or flushed all sets of three LRU bits are set to zero. The divides the four lines in a set into two pairs L0, L1 and L2, L3.
Bit B0 is used to select the pair that has been least-recently used. Within each pair, one bit is used to determine which member of the pair was least-recently used. However, the ultimate selection only approximates LRU.
Consider the case in which the order of use was: L0, L2, L3, L1. The least-recently used pair is L2, L3 and the least-recently used member of that pair is L2, which is selected for replacement.
However, the least-recently used line of all is L0. Depending on the access history, the algorithm will always pick the least-recently used entry or the second least-recently used entry. The most straightforward way to implement true LRU for a four-line set is to associate a two bit counter with each line.
When an access occurs, the counter for that block is set to 0; all counters with values lower than the original value for the accessed block are incremented by 1. When a miss occurs and the set is not full, a new block is brought in, its counter is set to 0 and all other counters are incremented by 1.
When a miss occurs and the set is full, the block with counter value 3 is replaced; its counter is set to 0 and all other counters are incremented by 1. This approach requires a total of 8 bits.
In general, for a set of N blocks, the above approach requires 2N bits. The scheme operates as follows. Consider a matrix R with N rows and N columns, and take the upper-right triangular portion of the matrix, not counting the diagonal. The LRU block is the one for which the row is entirely equal to 0 for those bits in the row; the row may be empty and for which the column is entirely 1 for all the bits in the column; the column may be empty.
Because the block size is 16 bytes and the word size is 1 byte, this means there are 16 words per block. We will need 4 bits to indicate which word we want out of a block. That means each cache slot contains 16 bytes. We need to pick any address where the slot is the same, but the tag and optionally, the word offset is different.
We no longer need to identify which slot a memory block might map to, because a block can be in any slot and we will search each cache slot in parallel. The word-offset must be 4 bits to address each individual word in the word block.
This leaves 16 bits leftover for the tag. As computed in part a, we have cache slots. If we implement a two -way set associative cache, then it means that we put two cache slots into one set. Once we address a set, we will simultaneously search both cache slots to see if one has a tag that matches the target. Initially, arbitrarily set the four values to 0, 1, 2, and 3 respectively. When a hit occurs, the counter of the block that is referenced is set to 0. When a miss occurs, the block in the set whose counter value is 3 is replaced and its counter set to 0.
All other counters in the set are incremented by 1. If the average line that is written at least once is written more than 2. A reference to the first instruction is immediately followed by a reference to the second.
The ten accesses to a[i] within the inner for loop which occur within a short interval of time. If it is in M2 but not M1, then a block of data is transferred from M2 to M1 and then read. Cache consists of 16 sets; each set consists of 4 slots; each slot consists of 64 words. Locations 0 through in main memory occupy blocks 0 through On the first fetch sequence, block 0 through 15 are read into sets 0 through 15; blocks 16 through 31 are read into sets 0 through 15; blocks are read into sets 0 through 15; blocks are read into sets 0 through 15; and blocks are read into sets 0 through 3.
Because each set has 4 slots, there is no replacement needed through block The last 4 groups of blocks involve a replacement. On each successive pass, replacements will be required in sets 0 through 3, but all of the blocks in sets 4 through 15 remain undisturbed.
Thus, on each successive pass, 48 blocks are undisturbed, and the remaining 20 must read in. Let T be the time to read 64 words from cache. Then 10T is the time to read 64 words from main memory. If a word is not in the cache, then it can only be ready by first transferring the word from main memory to the cache and then reading the cache.
Thus the time to read a word block from cache if it is missing is 11T. We can now express the improvement factor as follows. From Equation 4. Under the initial conditions, using Equation 4. As the time for access when there is a cache miss become larger, it becomes more important to increase the hit ratio.
First, 2. Then, the required line is read into the cache. Then an additional 2. Consider the execution of instructions. Under write-through, this creates cache references read references and 32 write references. On average, the read references result in 0. For each read miss, a line of memory must be read in, generating 5. For write misses, a single word is written back, generating 32 words of traffic. Total traffic: For write back, instructions create cache references and thus 6 cache misses.
For write-through: [ 0. At that rate, the memory traffic is about equal for the two strategies. For a lower miss rate, write-back is superior. For a higher miss rate, write-through is superior.
One clock cycle equals 60 ns, so a cache access takes ns and a main memory access takes ns. The effective length of a memory cycle is 0. The calculation is now 0. Clearly the performance degrades. However, note that although the memory access time increases by ns, the average access time increases by only 12 ns. For a 1 MIPS processor, the average instruction takes ns to fetch and execute.
On average, an instruction uses two bus cycles for a total of ns, so the bus utilization is 0. For only half of the instructions must the bus be used for instruction fetch. This reduces the waiting time for other bus requestors, such as DMA devices and other microprocessors. A SRAM cell is a digital device, in which binary values are stored using traditional flip-flop logic-gate configurations.
Erasure is performed by shining an intense ultraviolet light through a window that is designed into the memory chip. EEPROM is a read- mostly memory that can be written into at any time without erasing prior contents; only the byte or bytes addressed are updated. In addition, it is possible to erase just blocks of memory rather than an entire chip. However, flash memory does not provide byte-level erasure. OE: output enable. WE: write enable. Each bit of the syndrome is 0 or 1 according to if there is or is not a match in that bit position for the two inputs.
If the syndrome contains all 0s, no error has been detected. If the syndrome contains one and only one bit set to 1, then an error has occurred in one of the 4 check bits. No correction is needed. If the syndrome contains more than one bit set to 1, then the numerical value of the syndrome indicates the position of the data bit in error.
This data bit is inverted for correction. It requires fewer pins on the package only one data out line ; therefore, a higher density of bits can be achieved for a given size package. Also, it is somewhat more reliable because it has only one output driver. These benefits have led to the traditional use of 1-bit-per- chip for RAM. This saves on cost and is sufficient reason to adopt that organization.
The fraction of time devoted to memory refresh is 9. The maximum data rate is 1 bit every ns, which is 10 Mbps. The length of a clock cycle is ns. Mark the beginning of T1 as time 0.
Address Enable returns to a low at RAS goes active 50 ns later, or time This can easily be met by DRAMs with access times of ns. Output leads are O3, O2, O1, O0 pulse h: read location 0 pulse i: read location 1 pulse j: read location 2 pulse k: read location 3 pulse l: read location 4 pulse m: read location 5 5. Now suppose that the only error is in C8, so that the fetched word is Check bit 8 calculated by values in bit numbers: 12, 11, 10 and 9 Check bit 4 calculated by values in bit numbers: 12, 7, 6, and 5 Check bit 2 calculated by values in bit numbers: 11, 10, 7, 6 and 3 Check bit 1 calculated by values in bit numbers: 11, 9, 7, 5 and 3 Thus, the check bits are: 0 0 1 0 5.
Thus, the data word read from memory was The minimum value of K that satisfies this condition is Comparing the two: C16 C8 C4 C2 C1 1 1 1 1 0 1 1 0 0 1 0 0 1 1 1 The result is an error identified in bit position 7, which is data bit 4.
Ability to support lower fly heights described subsequently. Better stiffness to reduce disk dynamics. Greater ability to withstand shock and damage 6. Pulses are sent to the write head, and magnetic patterns are recorded on the surface below, with different patterns for positive and negative currents. An electric current in the wire induces a magnetic field across the gap, which in turn magnetizes a small area of the recording medium. Reversing the direction of the current reverses the direction of the magnetization on the recording medium.
The MR material has an electrical resistance that depends on the direction of the magnetization of the medium moving under it. By passing a current through the MR sensor, resistance changes are detected as voltage signals. An increase in density is achieved with multiple zoned recording, in which the surface is divided into a number of zones, with zones farther from the center containing more bits than zones closer to the center. Data are transferred to and from the disk in sectors.
For a disk with multiple platters, the set of all the tracks in the same relative position on the platter is referred to as a cylinder. Once the track is selected, the disk controller waits until the appropriate sector rotates to line up with the head. The time it takes for the beginning of the sector to reach the head is known as rotational delay. The sum of the seek time, if any, and the rotational delay equals the access time, which is the time it takes to get into position to read or write.
Once the head is in position, the read or write operation is then performed as the sector moves under the head; this is the data transfer portion of the operation and the time for the transfer is the transfer time. RAID is a set of physical disk drives viewed by the operating system as a single logical drive.
Data are distributed across the physical drives of an array. Redundant disk capacity is used to store parity information, which guarantees data recoverability in case of a disk failure. The strips are mapped round robin to consecutive array members. A set of logically consecutive strips that maps exactly one strip to each array member is referred to as a stripe.
For higher levels, redundancy is achieved by the use of error-correcting codes. Typically, the spindles of the individual drives are synchronized so that each disk head is in the same position on each disk at any given time. At a constant linear velocity CLV , the disk rotates more slowly for accesses near the outer edge than for those near the center.
Thus, the capacity of a track and the rotational delay both increase for positions nearer the outer edge of the disk. Bits are packed more closely on a DVD. The spacing between loops of a spiral on a CD is 1. The DVD uses a laser with shorter wavelength and achieves a loop spacing of 0.
The result of these two improvements is about a seven-fold increase in capacity, to about 4. The DVD employs a second layer of pits and lands on top of the first layer A dual-layer DVD has a semireflective layer on top of the reflective layer, and by adjusting focus, the lasers in DVD drives can read each layer separately.
This technique almost doubles the capacity of the disk, to about 8. The lower reflectivity of the second layer limits its storage capacity so that a full doubling is not achieved. This brings total capacity up to 17 GB. In this technique, when data are being recorded, the first set of bits is recorded along the whole length of the tape. When the end of the tape is reached, the heads are repositioned to record a new track, and the tape is again recorded on its whole length, this time in the opposite direction.
That process continues, back and forth, until the tape is full. Recognize that each of the N tracks is equally likely to be requested. This follows directly from the last equation. The disk will need the seek time of 8 ms to find cylinder i, 8.
Then, the time needed to move to the next adjoining cylinder is 1. Assume a rotational latency before each track. If we assume that the head starts at track 0, then the calculations are simplified.
If the request track is track 0, then the seek time is 0; if the requested track is track 29,, then the seek time is the time to traverse 29, tracks. At one ms per tracks, the average seek time is therefore At rpm, there is one revolution every 8.
Therefore, the average rotational delay is 4. With sectors per track and the time for one complete revolution of 8. The result is the sum of the preceding quantities, or approximately ms.
The time consists of the following components: sector read time; track access time; rotational delay; and sector write time. The rotational delay is the time required for the head to line up with sector 1 again. This is The head movement time of 2 ms overlaps the with the The time to read or write an entire track is simply the time for a single revolution, which is Between the read and the write there is a head movement time of 2 ms to move from track 8 to track 9. During this time the head moves past 3 sectors and most of a fourth sector.
However, because the entire track is buffered, sectors can be written back in a different sequence from the read sequence. Thus, the write can start with sector 5 of track 9. This sector is reached 0. That is, the CD scheme has a data storage density 1. So the size of the backup would have to be about 5 TB for tape to be less expensive.
One where you keep a lot of backup sets. Machine readable: Suitable for communicating with equipment. Communication: Suitable for communicating with remote devices 7.
Processor communication. Device communication. Data buffering. Error detection. Otherwise, the process is suspended pending the interrupt and other work is performed.
The processor sends a request for the transfer of a block of data to the DMA module and is interrupted only after the entire block has been transferred. The full range of addresses may be available for both. Typically, this would allow devices to be addressed.
However, an opcode specifies either an input or output operation, so it is possible to reuse the addresses, so that there are input port addresses and output port addresses. The first device requires only one port for data, while the second devices requires and input data port and an output data port.
Because each device requires one command and one status port, the total number of ports is seven. The printing rate is slowed to 5 cps.
0コメント