Construction of the decade. Exploring the AMD Bulldozer Processor Architecture

As you know, Intel has been adhering to the “Tick-Tock” development strategy for more than five years, changing the technological process of production in odd years, and microarchitecture in even years. AMD follows a completely different policy, improving its models as new technologies become available. So, the company carried out the last microarchitecture update almost four years ago, releasing the Phenom CPU on K10, which has since mastered three technical processes - 65 nm for Agena, 45 nm for Deneb and 32 nm for Llano. Nevertheless, sooner or later the potential of any development exhausts itself and the need for its radical renewal is brewing.

And, unlike Intel, which systematically refreshes its microarchitecture every two years, AMD prefers to do it a little less often, but making more changes and improvements. In fact, since the appearance of the first Athlons based on the K7, there have been only two of its updates, but significant and radical ones: K8, introduced in 2003 and becoming the basis for Athlon 64, and the already mentioned K10, in the Phenom and Athlon II families. Yes, later the company increased frequencies, caches and the number of cores in its products, changed technological processes, but their structure, which is the basis and "heart" of entire CPU families, remained unshakable.

A bit of theory

The new Bulldozer microarchitecture, debuting in AMD FX processors, differs significantly from the previous one - K10, and besides, it does not fit into the strategy of previous updates, when engineers sought to correct their shortcomings and highlight their advantages. Looking at K10, you could see the outlines and general topology of K8 and K7 in it, and if you do the same with Sandy Bridge, you can see a number of features of the previous Nehalem and Conroe in the latter.

And if you take Bulldozer, it immediately catches your eye that it is actually completely different from both K10 and other x86 compatible microarchitectures. Against the background of its predecessors, the novelty looks no less unusual than an airplane against a helicopter. Let's consider it in more detail, but I will immediately make a reservation that I will try to explain the essence and nature of the changes without getting into the technical jungle and subtleties, since for the majority it is boring and uninteresting, and for those who need it, they already know where to find the information they are interested in.

The main difference between Bulldozer and other current processor microarchitectures lies in the layout of x86 cores, which are now located in pairs in one "module" and share other resources - a real computing unit (FPU), a second-level cache (L2) and the so-called "front end" , the latter will be discussed below. Thus, each module of the new microarchitecture is something between a conventional dual-core CPU and a processor core with Hyper-Threading.

In a sense, this is even a development of the idea of Hyper-Threading, but unlike it, where two threads "split" the same amount of hardware resources, in the Bulldozer "a module, two threads share some of the resources, and some are received for sole use. But the balance selected correctly, all "heavy" and "expensive" (from the point of view of the transistor budget) blocks are distributed between two cores, and the x86 cores themselves are duplicated, since only about 12% of the total number of transistors in the module is spent on each of them.

From the point of view of performing integer and address operations, each module represents two full-fledged and independent cores, between which FPU resources are divided during real calculations. These same cores, in fact, serve the FPU, sending it instructions for execution, loading and unloading data, storing and leaving MOPs, since it is to them that computing threads, out-of-order execution mechanisms and first-level data caches (L1D) are attached.

Obviously, the main advantage of this scheme over a single core is in increased performance under multi-threaded loads, especially with an emphasis on integer calculations. Let's try to consider the main blocks of Bulldozer in more detail.

front end

In fact, the "front end" is a set of logical devices that provide the preparation of instructions for execution on computing devices. It includes branch prediction blocks, the accuracy of which affects how often the CPU will idle while waiting for the transfer of the necessary data from RAM or caches, the first-level instruction cache (L1I) and a decoder that “translates” x86 instructions into understandable for actuating devices, the view is MOPs.

The changes that have affected these blocks are ambiguous. On the one hand, the accuracy of transition predictions has increased. When decoding from the cache, data is read in chunks of 32 bytes, like the K10, which is good and twice as much as Sandy Bridge. Instructions are now processed in four channels instead of three as in the K7-K10. And this is one of the most important and long-awaited improvements in microarchitecture. But AMD has only now introduced a 4-channel decoder, while Intel had it five years ago, in Conroe (Core2). At the same time, the instruction cache is actually the same size and associativity (64 KB, 2-way) as in K10, where it migrated without any changes since K7.

Also, do not forget that now both the instruction cache and the decoder will be needed not by one, but by two threads, so their capabilities can be conditionally divided in half with an intensive multi-threaded load. In summary, we can say that the new “front end” looks better in some ways, and worse in some ways than its predecessors, and will demonstrate its strength and weakness depending on the nature of the task.

X86 core

These blocks, in the amount of two pieces per module, are just the very distinguishing feature of Bulldozer "a and allow one module to process two instruction streams. In fact, they concentrate the main x86 core devices with an out-of-order instruction execution mechanism (Out-of-Order Execution), namely, the buffer of MOPs received from the decoder (Sheduler), the device for retiring the executed instructions (Retire), the integer execution units themselves and the address generation devices (ALU and AGU), two pieces per x86 core, as well as the data cache of the first level (L1D) and load unload unit (LSU).

In many ways, the Bulldozer x86 core resembles the K10 integer block, but there are a number of noticeable and ambiguous changes. Firstly, the number of ALUs and AGUs has been reduced from three to two, compared to K10. On the one hand, this is a drop in peak theoretical performance in one and a half times, on the other hand, it is practically impossible to squeeze it out in practice, so the loss is not great, although there is.Secondly, the data cache has become four times smaller than that of K10, 16 KB instead of 64 KB, but its associativity has grown from two paths to four.So you can call it a justified trade of volume for speed.

Well, LSU has become better in everything, both the nominal and effective buffer capacity has increased significantly, and the bit depth of write operations has been doubled.

FPU

Perhaps one of the most important blocks of the processor - the block of real calculations, is responsible, as you might guess, for performing floating point operations, as well as executing SSE instruction sets of all versions, AVX, FMA and individual commands. In fact, the Bulldozer FPU is the most powerful and functional to date, and thanks in large part to it, AMD hopes to defeat competing Intel solutions based on the Sandy Bridge microarchitecture.

The FPU Bulldozer is based on two FMAC devices, each 128-bit. Unlike K10, where different devices were responsible for addition and multiplication, these are universal and capable of executing the full range of supported commands. We can say that AMD has moved from an asymmetric scheme of FPU actuators to a symmetrical one. In the case of sharing resources between two x86 cores, each can work with its own FMAC device.

The only exception is the execution of AVX commands with a 256-bit capacity, in which case both computing devices perform this operation as a single unit. Moreover, it is worth noting that if with AVX operations of 256-bit capacity its performance per cycle is equal to the FPU Sandy Bridge, then with a decrease in the capacity of AVX operations to 128-bit, the rate of their execution exceeds that by two times.

In addition to speed, it is worth remembering about functionality. As already mentioned, the Bulldozer "a block of real calculations supports FMA (fused multiply-add - combined multiplication-addition) commands of the form A \u003d B x C + D. Moreover, the multiplication result is not rounded before addition, which positively affects the accuracy of calculations. In general, we can say that the FPU is in all respects better than in previous AMD microarchitectures, and engineers can be proud of their work.

Caches and northbridge

The cache subsystem has also undergone several important changes compared to K10. As already mentioned, the first level data cache (L1D) exchanged volume for associativity, and the instruction cache (L1I) remained virtually unchanged. The second-level (L2) cache, which was previously used solely by one core, is now shared between the two x86 cores of the module. In addition, the L2 cache has grown from 512 KB to 2 MB compared to the K10. The associativity level remained the same, 16-way.

This means that the eight-core, four-module Bulldozer microarchitecture CPU uses four L2 caches with a total capacity of 8 MB. But, most likely, the growth in volume and the need to share resources between the two cores also left a negative imprint on the access time to the second-level cache. The L3 cache and memory controller, like the K10, operate at their own frequency, which is lower than the modules' frequencies. For the announced processors, it is 2-2.2 GHz, depending on the model. This is less than Sandy Bridge, where the integrated memory controller and L3 cache operate at the core frequency. Bulldozer's L3 cache size is now 8 MB, and its associativity is 64-way, which is a third more than Deneb's (6 MB and 48-way, respectively).

It is also worth recalling that the cache AMD processors organized according to the so-called exclusive scheme, when data in caches of different levels is not duplicated and the total volume of all of them can be considered effective. Summing up the caches, I will say that the changes in L1 and L2 are significant, but ambiguous, and L3 looks like a logical development of K10 developments.

The AMD FX CPU memory controller has not undergone significant changes, it is still dual-channel, and the nominally supported frequency of DDR3 memory modules has increased to 1866 MHz.

Turbo Core 2.0

The auto-overclocking technology that debuted in the AMD Phenom II X6 models has been significantly improved and is in many ways similar to that used in the Sandy Bridge line. The processor has a special block that monitors the current CPU consumption and core load, and based on this information changes the frequency of the module cores. If the CPU consumption does not exceed TDP, then the frequencies of all cores can rise above the base ones by a given value.

For example, for AMD FX-8150, the frequency increases from the standard 3.6 GHz to 3.9 GHz for all eight cores. And when the processor consumption is below TDP, and some of the cores are idle, then the frequencies of the loaded cores can rise even higher, up to 4.2 GHz, in the case of the AMD FX-8150. In fairness, it is worth recalling that a similar technology is used in AMD Llano, which takes into account the consumption of not only CPU cores, but also the integrated graphics processor.

Theory - conclusion

What can be said, summing up the new microarchitecture? As has been shown above, there are a lot of changes, all of them are deep and ambiguous. There is no doubt that Bulldozer is AMD's new microarchitecture. This also means that it can also show itself very ambiguously, demonstrating in some places the performance is slightly lower than that of the K10, and in some places much more.

Nevertheless, in terms of support for modern instruction sets and automatic overclocking technologies, focus on multi-threaded workload, the new AMD development is not inferior to the competing Sandy Bridge, and in some cases looks even more profitable. And although it is noticeable that the Bulldozer has a number of weaknesses, they can easily be eliminated in the future.

This is likely to be the basis of the company's strategy for the coming years. Bulldozer can be seen as an investment in its future, it is the skeleton of the next microarchitectures, which will grow "meat" and give performance gains. According to current plans, AMD will update the microarchitecture of its processors annually, and not once every few years, which should respond with a 10-15% increase in performance and an increase in the energy efficiency of future solutions.

Separately, I would like to mention the moment concerning the distribution of computational threads among the cores. Windows 7 in its current form is not optimized for processors with the Bulldozer microarchitecture and is not able to correctly distribute threads, which in some cases leads to performance losses, since the CPU cannot use clock boost technologies, or dependent computational threads communicate through L3, and no more fast L2, since they were tied to the cores of different modules.

AMD in its materials indicates that the Windows 8 scheduler already knows how to work correctly with Bulldozer and the performance advantage over Windows 7 can reach up to 10% in some cases, which, you see, is a lot. However, perhaps Microsoft will release a patch for the "seven", which will teach this popular operating system to properly distribute threads for new AMD processors.

Now is the time to finish with the theory and see how the new AMD flagship can please in practice.

Testing tools and methodology

The speed of the processor-chipset-memory bundle was evaluated by the following applications:

Cinebench 10;
Cinebench 11.5;
pov-ray All CPU Total seconds;
TrueCrypt Serpent-Twofish-AES;
wPrime 2.00;
x264v3(outdated version, without aggressive optimizations for multithreading);
x264 v4(new version, well optimized for multithreading with new codecs);
WinRAR;
Photoshop CS5 x64(application of a sequence of several dozen filters);
Autodesk Revit Architecture 2012(visualization of a 3D drawing of a house).

test bench

Several systems participated in testing using a wide range of components, including motherboards. The table below will allow you to get acquainted with the full description of the stands, as well as the operating modes of the configurations.

maternal pay	NB	Chipset	Frequency memory	Quantity nuclei	Frequency tires	Factor	Turbo	Processor name /mode
ASUS Crosshair V	2200	FX990	1333	8	200	21	4200 MHz	FX8150 3600 MHz
MSI 990FXA-GD80	2000	FX990	1333	6	200	16.5-18.5	3700 MHz	Phenom II 1100 3300 MHz
MSI 990FXA-GD80	2000	FX990	1333	4	200	18.5	-	Phenom II 980 3700 MHz
MSI 990FXA-GD80	2000	FX990	1333	4	200	15.5	-	Athlon II 645 3100 MHz
MSI A75MA-G55	-	A75	1333	4	100	29	-	A8 3850 2900 MHz
MSI A75MA-G55	-	A75	1333	4	100	24-27	2700 MHz	A8 3800 2400 MHz
MSI A75MA-G55	-	A75	1333	4	100	26	-	A6 3650 2600 MHz
MSI A75MA-G55	-	A75	1333	3	100	21-24	2400 MHz	A6 3500 2100 MHz
MSI A75MA-G55	-	A75	1333	2	100	27	-	A4 3400 2700 MHz
MSI Z68A-GD80	-	Z68	1333	4	100	34-38	3800 MHz	i7 2600k 3400 MHz
MSI Z68A-GD80	-	Z68	1333	4	100	33-37	3700 MHz	i5 2500 3300 MHz
MSI Z68A-GD80	-	Z68	1333	4	100	31-34	3400 MHz	i5 2400 3100 MHz
ASUS P6X58D	2667	X58	1333	4	133	23	3060 MHz	i7 930 2800 MHz
MSI Z68A-GD80	-	Z68	1333	2	100	31	-	i3 2100 3100 MHz
ASUS Crosshair V	2200	FX990	1866	8	200	21	4200 MHz	FX8150 3600 MHz 1866
MSI A75MA-G55	-	A75	1866	4	100	29	-	A8 3850 2900 MHz 1866
MSI A75MA-G55	-	A75	1866	4	100	24-27	2700 MHz	A8 3800 2400 MHz 1866
MSI A75MA-G55	-	A75	1866	4	100	26	-	A6 3650 2600 MHz 1866
MSI A75MA-G55	-	A75	1866	3	100	21-24	2400 MHz	A6 3500 2100 MHz 1866
MSI A75MA-G55	-	A75	1866	2	100	27	-	A4 3400 2700 MHz 1866
MSI Z68A-GD80	-	Z68	1866	4	100	34-38	3800 MHz	i7 2600k 3400 MHz 1866
MSI Z68A-GD80	-	Z68	1866	4	100	33-37	3700 MHz	i5 2500 3300 MHz 1866
ASUS Crosshair V	2200	FX990	1866	8	200	22.5	-	FX8150 4500 MHz
MSI 990FXA-GD80	2380	FX990	1820	6	340	12.5	-	Phenom II 1100 4250 MHz
MSI 990FXA-GD80	2400	FX990	1600	6	200	21	-	Phenom II 1100 4200 MHz
MSI 990FXA-GD80	2400	FX990	1600	4	200	22.5	-	Phenom II 980 4500 MHz
MSI 990FXA-GD80	2240	FX990	1500	4	280	16	-	Phenom II 980 4480 MHz
MSI A75MA-G55	-	A75	2000	4	150	29	-	A8 3850 4350 MHz
MSI A75MA-G55	-	A75	2040	4	153	27	-	A8 3800 4133 MHz
MSI A75MA-G55	-	A75	1900	4	142	26	-	A6 3650 3700 MHz
MSI A75MA-G55	-	A75	1900	3	142	24	-	A6 3500 3400 MHz
MSI A75MA-G55	-	A75	2050	2	154	27	-	A4 3400 4160 MHz
MSI 990FXA-GD80	2170	FX990	1650	4	310	12	-	Athlon II 645 3720 MHz
MSI Z68A-GD80	-	Z68	1866	4	100	48	5000 MHz	i7 2600k 5000 MHz
MSI Z68A-GD80	-	Z68	1866	4	100	45	-	i7 2600k 4500 MHz
ASUS P6X58D	3200	X58	1600	4	200	21	-	i7 930 4200 MHz

RAM: 8 GB, (2x4). Timings 9-9-9-24-2T, frequency from 1333 MHz to 2050 MHz, depending on settings and testing conditions;
Video card: AMD HD 6790;
Hard drive: SSD Crucial M4 128 GB;
Power supply: Tagan TG1100-U95 1100 W;
Operating system: Microsoft Windows 7 x64 Sp1.

And three test modes:
1. Nominal frequencies of the processor, memory 1333 MHz.
2. Nominal frequencies of the processor, memory 1866 MHz.
3. Overclocking, the memory runs at different frequencies depending on the multiplier.

Test results

As a starting point, we took a configuration consisting of a motherboard based on the 990FX chipset, an AMD FX 8150 CPU, and a memory operating at a frequency of 1333 MHz with timings of 9-9-9-24-2T.

Cinebench 10

Settings:

Mono-thread and multi-thread test.
CPU profile.

Points

Rated mode: Performance 1 CPU | Multi CPU

A test using both one and all cores shows not the best state of affairs for a beginner who feels out of place if the load falls on only one core. As soon as the program uses all the cores, the situation changes significantly, and it becomes a direct competitor to the Intel i5-2500. However, AMD positions its CPU with the 8150 index in exactly this way. Comparing the performance of the FX with the i7-930, one can be convinced of the superiority of the former over the latter.

Points

Memory at 1866 MHz: Performance 1 CPU | Multi CPU

Please enable JavaScript to see charts

Overclocked memory has little effect on the performance of any modern AMD processor, so there is no need to run to the store and acquire high-frequency modules at all.

Points

Overclocking: Performance 1 CPU | Multi CPU

Please enable JavaScript to see charts

The FX 8150 is still poorly understood, and overclocking is accompanied by difficulties in understanding the motherboard and the processor. It was clear from the temperatures that the Bulldozer was able to operate at a higher frequency, but other multipliers did not turn on. I believe that over time, manufacturers will update the BIOS more than once before the friendship of the components is established. Nevertheless, 4.5 GHz is not a bad figure, and thanks to this overclocking, the newcomer in the multi-threaded test confidently outperforms almost all Intel processors, with the exception of the overclocked i7-2600K.

Name	1 CPU%	xCPU %	Average
FX 8150 3600 MHz	0	0	0
Phenom II 1100 3300 MHz	2	-9	-4
Phenom II 980 3700 MHz	5	-26	-11
Athlon II 645 3100 MHz	-20	-46	-33
A8 3850 2900 MHz	-18	-42	-30
A8 3800 2400 MHz	-28	-51	-40
A6 3650 2600 MHz	-27	-47	-37
A6 3500 2100 MHz	-37	-66	-51
A4 3400 2700 MHz	-28	-72	-50
i7 2600K 3400MHz	52	12	32
i5 2500 3300 MHz	49	1	25
i5 2400 3100 MHz	34	-7	14
i7 930 2800 MHz	8	-15	-4
i3 2100 3100 MHz	23	-46	-11
FX 8150 3600 MHz 1866	0	1	0
A8 3850 2900 MHz 1866	-17	-40	-28
A8 3800 2400 MHz 1866	-27	-48	-37
A6 3650 2600 MHz 1866	-24	-46	-35
A6 3500 2100 MHz 1866	-36	-65	-50
A4 3400 2700 MHz 1866	-26	-72	-49
i7 2600K 3400MHz 1866	52	16	34
i5 2500 3300MHz 1866	50	1	25
FX 8150 4500 MHz	10	23	16
Phenom II 1100 4250 MHz	20	14	17
Phenom II 1100 4200 MHz	19	14	16
Phenom II 980 4500 MHz	27	-11	8
Phenom II 980 4480 MHz	26	-11	8
A8 3850 4350 MHz	23	-12	6
A8 3800 4133 MHz	17	-17	0
A6 3650 3700 MHz	6	-25	-10
A6 3500 3400 MHz	-1	-49	-25
A4 3400 4160 MHz	13	-56	-22
Athlon II 645 3720 MHz	-4	-34	-19
i7 2600K 5000 MHz	106	52	79
i7 2600K 4500 MHz	83	46	64
i7 930 4200 MHz	49	18	34

Not only admirers of the company's products, but also many users who follow IT progress have frankly been waiting for AMD processors with a fundamentally new Bulldozer architecture. Over the past few years, offering interesting price / performance solutions, AMD has mainly concentrated on entry-level and mid-range devices. By resurrecting the FX line, it's obvious that the company expects to attract the attention of more demanding enthusiasts who are ready to experiment and demand maximum speeds. We will study the possibilities of the new family using the example of the world's first eight-core processor for desktops - AMD FX-8150. Let's see if the manufacturer will be able to meet the expectations of its fans.

Unlike its main competitor, which can afford to follow the pendulum principle of CPU development, changing architectures and technological processes every year, AMD does not outline specific time frames for its projects, relying on the market sense and its own technological potential. The history of the Bulldozer architecture began a long time ago. It was supposed to be presented back in 2009, but due to various circumstances, the practical implementation of bold engineering solutions in silicon has become possible only now.

Bulldozer for AMD is serious and for a long time. This microarchitecture for the next few years will become the basis for future processors from various segments: server, desktop and mobile. This applies to both discrete CPUs and hybrid ones - APUs are also planned to be transformed under Bulldozer over time. Only for compact systems, AMD is going to use the chips on the economical Bobcat and its upgraded versions. With the announcement of Bulldozer, the company decided to revive the legendary series by introducing AMD FX processors, which received a new architecture and are manufactured using the most advanced 32-nanometer process technology.

Architectural features

Bulldozer chips are based on modules with two x86 computing units. At the same time, the latter are not completely autonomous - some resources are common to both cores. Specifically, the prefetcher, instruction decoder, FPU, and L2 cache. The monolithic dual-core module allows two threads to run simultaneously, but with certain caveats. According to the manufacturer's calculations, this approach is quite justified and allows you to get about 80% of the efficiency of full-fledged physical cores. However, this significantly reduces the number of transistors, and, accordingly, the area of the crystal and its power consumption.

Taking into account the new structure, the internal architecture was seriously redesigned, which actually affected all execution units. There are practically no similarities with K10, which was used for Phenom II and Athlon II chips. AMD has implemented support for AVX, SSE 4.2 and AES-NI instructions and added its own FMA4 and XOP sets.

Like the top Phenom processors, the FX chips received a three-level caching system. However, its organization is also noticeably different from that of its predecessors. The L1 data cache has decreased from 64 KB to 16 KB, while at the same time its throughput has increased significantly. L2 of 2 MB is shared between both cores of each module. Depending on the number of the latter, the total capacity of the second-level cache in the AMD FX processor can be from 4 to 8 MB. Its latency is slightly increased - the price for optimization for working at higher frequencies. Chips with Bulldozer architecture are also equipped with an 8 MB L3 cache. Given the exclusive scheme of work, the total buffer size is quite impressive for desktop models. The improved data prefetch algorithm allows us to hope that the speed of the memory subsystem will be increased. As for the RAM itself, CPU FX support DDR3-1866 modules in dual-channel mode.

The AMD FX uses a 32nm SOI process similar to the Llano APU. The chips are produced at the facilities of the related company GlobalFoundries. The CPU is based on an eight-core crystal with an area of 315 mm2. According to the topology, most of it is allocated for cache memory, therefore it is not surprising that the total number of transistors in this case is an impressive 2 billion. For comparison: the six-core Phenom II X6 (Thuban) includes "only" 904 million transistors, but due to -nanometer process technology, the area of the crystal is 346 mm2. Given the difference in area, we can assume that the cost of FX chips is lower than that of their predecessors. However, the transition to 32 nm is not easy for GlobalFoundries. AMD has already reported difficulties with the release of suitable blanks, due to which the company cannot fully satisfy the demand for hybrid Llanos. Let's hope that this will not affect the availability of FX for sale in any way, and everyone will be able to purchase them.

The same crystal will be used for four- and six-core models, which will allow more efficient use of chips that have certain defects. Meanwhile, it is likely that fully functional chips with deactivated modules will also be used for the production of CPU data. And in this case, you can count on the next lottery with the unlocking of disabled cores. It would be a great way to stir up interest in AMD FX processors.

Processor Specifications

Model	FX-8150	Phenom II X6 1075T	Phenom II X4 975	Core i7-2600K	Core i5-2500K
codename	Bulldozer	Thuban	Deneb	Sandy Bridge	Sandy Bridge
Number of cores/threads, pcs.	8/8	6/6	4/4	4/8	4/4
Base clock frequency, GHz	3,6	3	3,6	3,4	3,3
Clock frequency after auto-acceleration, GHz	3,9/4,2	3,5	–	3,8	3,7
L2/L3 cache size, MB	8/8	6×0.5/6	4×0.5/6	4×0.25/8	4×0.25/6
Production technology, nm	32	45	45	32	32
Processor socket	AM3+	AM3	AM3	LGA1155	LGA1155
Power consumption (TDP), W	125	125	125	95	95
Recommended price, $	245	181(162*)	175 (160*)	317 (315*)	216 (225*)
* According to the Hotline.ua catalog.

Turbo Core

Turbo Core technology was previously used by AMD for the six-core Thuban and Llano APUs. FX processors have a new mechanism and algorithm for this function. In the case when under load the power consumption of the chip falls within its TDP, and the temperature does not exceed the set value, the frequency can be automatically increased (100–300 MHz) even when all cores are active (All Core Boost). If at least half of the modules are idle, then AMD FX can switch to Max Turbo Boost mode, increasing the supply voltage and significantly increasing the clock frequency of the working units (up to 900 MHz).

AMD has also taken care of improving the efficiency of new chips. Given the growth in the number of computing cores, it is impossible to rely only on the effect of using a thinner process technology. When there is no load on both processor cores within the same module and they enter the C6 power-saving state, the power transistors allow you to turn off power from this node, reducing overall CPU consumption.

Logic support

Like the previous AMD desktop platform, the bus controller PCI Express 2.0 remained the prerogative of the northbridge of the chipset, and did not move under the processor cover. It is the number of supported lines of this interface, and, as a result, the ability to build configurations with several video cards that have become the defining differences between the new logic sets for Zambezi chips. The top-end AMD 990FX has 42 links with the ability to link to graphic needs as 2x16x or 4x8x. The AMD 990X has 26 lanes and only allows two graphics cards to be paired in CrossFireX or SLI mode in a 2x8x configuration. Well, AMD 970, with the same number of PCI-E links, offers to be content with one adapter. In all cases, the peripherals are served by the SB950 southbridge, which does not bring any interesting innovations: six SATA 6 Gb / s ports with the ability to create RAID (0,1,5,10), up to 14 USB 2.0 connectors, work with PCI. Alas, unlike the AMD A75 chipset for the FM1 platform, there is no support for the high-speed USB 3.0 bus.

AM3+ platform

FX series processors require an AM3+ socket motherboard. It can be either a model based on the “new” AMD 9xx chipset, or a product with the logic of previous generations. Compatibility with AM3 is theoretically possible, but not guaranteed by AMD itself or by motherboard manufacturers. It is possible that the latter will release firmware for their top solutions, but these are rather isolated cases. And even in such situations, FX chips will function with a reduced Turbo Boost and Cool'n'Quiet state switching speed. In this case, all possible problems with the operation of the system will fall on the shoulders of users. Therefore, it is not necessary to count on a problem-free upgrade in this case.

Boards with AM3+ are easily distinguished by the black color of the processor socket, while the AM3 connector is white. Fortunately, the design of the CO mounting elements has not changed, so any cooler compatible with AM2/AM2+/AM3 is suitable for cooling AMD FX.

The lineup

3DMark 11 CPU test (Physics) scores

3DMark Vantage points

PC Mark 7, Computation test, points

CineBench 11.5 points

x264 HD Benchmark 4.0 fps

7-Zip 9.20 MIPS

Far Cry 2, 1920×1080, DX10, high quality, fps

Hard Reset, 1920×1080, High mode, fps

Metro 2033, 1920×1080, DX11, PhysX, high quality, fps

Colin McRae: DiRT 3, 1920x1080, high quality, fps

Lost Planet 2, 1920×1080, DX11, high quality, test B, fps

Crysis 2, 1920×1080, DX9, high quality, Downtown test, fps

System power consumption, W

Thanks to the modular structure of the company's processors, it is easy to build your lineup, offering devices with a different number of computing units and clock speeds. At launch, the line of desktop chips, dubbed Zambezi, includes four CPUs. The flagship is the eight-core solution FX-8150 with a frequency formula of 3.6 / 3.9 / 4.2 GHz. 8 MB of L2 and L3 cache, as well as a TDP of 125 watts. The equipment is similar to the FX-8120, the only difference is in the frequency mode of operation - 3.1 / 3.4 / 4.0 GHz. The six-core FX-6100 has 6MB L2 cache and the same 8MB L3, but its TDP is 95W. The most affordable version with two modules and four computing units x86 FX-4100 operates at 3.6 / 3.7 / 3.8 GHz, is content with 4 MB L2, capacious L3 (8 MB) and a TDP of 95 watts. As for the cost of devices, the recommended wholesale prices for the listed models are at the level of $245/205/165/115, respectively.

Overclocking

The ability to freely overclock processors is one of the key parameters of FX chips. AMD makes a separate emphasis on this feature. The free multiplier is available to all models of the line, and the ability to change it will be present on any board with AM3+.

The FX architecture was originally designed with high clock speeds in mind. Craftsmen, armed with vessels of liquid nitrogen, were able to get a screenshot of CPU-Z in a situation where the processor was running at almost 8.5 GHz. At the same time, however, it was necessary to leave active only one module out of four. All eight cores were forced to function at 8.1 GHz. Previously, only the most lightweight versions of Intel Celeron for LGA775 reached such frequencies. Now, enthusiasts will have a much more interesting object for overclocking experiments.

In the case of an air cooling system, you will have to be content with more modest results. When the supply voltage was increased to 1.45 V, the CPU worked stably at 4.6 GHz. Maybe not as impressive, but the potential is clearly better than the 45nm Phenom II chips.

Results

The performance test results are shown in the charts. The picture is quite indicative in order to generally form an opinion about the possibilities of the new AMD development. As expected, FX processors received an increase in performance in multi-threaded tasks - archiving, HD video encoding, rendering. Here, the eight-core chip is quite capable of competing with both the Core i5-2500K and the more expensive Core i7-2600K. However, as soon as it comes to applications with unimportant optimization for parallel code execution, AMD FX lose ground - the specific performance of their x86 blocks is even slightly lower than that of products with the K10 architecture. In games that use 3-4 threads at best, Intel processors have a noticeable advantage. If you use the maximum graphics quality settings, where the video card becomes a limiter, the performance of the systems levels out, but it is impossible to assess the real potential of the CPU in such conditions.

The transition to the 32-nanometer process technology, rather, made it possible to keep power consumption at the same level with increased performance. Probably, the priority in this case was performance, and not improved CPU efficiency.

Even judging by the cost of AMD FX, it is obvious that the company first of all plans to gain a foothold in the middle price category, deliberately giving Intel the segment of top-end expensive solutions. In the current conditions, the manufacturer is objectively unable to perform adequately in the league of "heavyweights". Having bet on multi-core computing, getting outstanding results in poorly optimized software is very problematic. At the same time, just five years ago, we sincerely wondered who might need a quad-core processor on a desktop and how to efficiently use the resources of such a CPU. Today, this is commonplace, and the advantages of chips with so many computing units no longer raise questions. Perhaps, eight-core models will receive similar recognition some time later.

Thankfully, AMD won't be idly watching what happens to its processors. The announced plans for further development inspire, albeit restrained, but still optimistic. The company will continue to actively refine the current architecture, improving both energy efficiency and CPU performance, but these rates - 10-15% per year - are not very impressive. With such indicators, one can count on a fundamental change in the situation only if Intel slows down the development of its products, but there are no prerequisites for this - the tick-tock mechanism has not yet failed. Already in the spring of 2012, Ivy Bridge chips will be presented, made using 22-nanometer technology and using 3D transistors.

The final assessment of the considered architecture and the AMD FX-8150 processor based on it is ambiguous, and this already indicates that the revolution has not happened. At least at this stage, it is invisible to the end user. A qualitative jump in performance takes place on well-parallelized applications, while there is no serious increase in single-threaded tasks. The high expectations placed on the Bulldozer were only partly met. AMD still has a lot of work to do to offer interesting solutions and compete for a place in the hearts of demanding enthusiasts.

Bulldozer is the code name for AMD64 processors made using 32 nm technology and primarily aimed at server platforms and high-performance personal computers.

Innovations
Bulldozer processors have a completely different core architecture in their arsenal, unlike the previous generation AMD K8 and AMD K10. A quick glance at the Zambezi die for an 8-core processor often makes the mistake of visually identifying only four cores. In fact, these are computing modules. AMD engineers placed x86 processor cores in pairs in one module. So it turns out that eight-core processors come with four modules, six-core processors already have three modules in their arsenal, and quad-core processors have only two, respectively. The benefit of such a solution is to increase the performance of the processor under multi-threaded workloads.

In addition to the standard features inherent in old AMD processors, new ones have been added: SSE4.1, SSE4.2, CVT16, AVX, XOP and FMAC. AMD Fusion technology is also implemented - a combination of a graphics core and a central processor, an analogue of Sandy Bridge technology.

AMD Bulldozer processors now support the new version of AMD Direct Connect technology (eliminates the shortcomings of some architectures during data exchange), as well as four HyperTransport 3.1 channels, respectively, per processor. The AMD G3MX Memory Expansion Technology provides the ability to significantly increase processor bandwidth.

In addition, we should note full support for DDR3 memory with a frequency of 1866 MHz and a significantly increased L3 cache to 8 MB.

The energy management mechanism has also undergone major changes. A certain role here was played by the 32-nm process technology, thanks to which the nominal voltage does not exceed 1.4 V, but mainly due to the improved clock frequency adjustment mechanism - the thermal package does not exceed 125 W.

On previous models of Phenom II X6 processors, if the load was no more than 3 threads, the frequency of 3 active cores increased by 400 MHz. Bulldozers are equipped with a more flexible speed control mechanism. In case of no load, the power saver manager can turn off the module along with the L2 cache memory array. Thus, a reduction in heat generation is achieved. At the same time, the clock frequency of the involved computing modules, if necessary, can increase, in the activated Max Turbo mode, the increase is up to - 900 MHz. When there is approximately the same load on all computing modules, then the frequency increase is possible within 300 MHz. The new Bulldozer processors have support for Turbo Core 2 technology, an analogue of Intel Turbo Boost (increasing the processor frequency from a nominal 3.5 to 4.2 GHz), which has a positive effect on performance. Turbo Core is active until the processor's power consumption exceeds the set TPD (thermal package) limit. For this reason, for the new AMD FX processors, such a concept as "standard clock speed" loses its generally accepted meaning.

By the way, in terms of overclocking potential, it was the AMD FX-8150 processor that was overclocked to 8.429 GHz, which is currently an absolute record.

Unfortunately, the Windows process scheduler is currently not fully optimized for AMD FX CPUs. There is a high probability that two threads of the same program will be processed by two different modules, this will not allow you to use the Max Turbo mode, or you will need to reload data into the cache memory of the desired module, which will ultimately affect performance.

According to the information, in Windows 8, the task scheduler will take into account the features of the Bulldozer architecture, which will allow using Turbo Core to its full potential. Well, users of Windows 7 and XP should hope for an update, which may be released by Microsoft or AMD programmers will release some “magic” driver.

We do not conduct tests of AMD FX Bulldozer processors (Bulldozer) - there are already a huge number of them on the Internet, and the article is more aimed at getting to know this line and revealing its features, rather than getting test results.

So, from the numerous graphs and benchmarks, you can see the picture. The top-of-the-range FX-8150 compared to the Core i5 2500k:

Loses in tests that generate single-threaded loading (by the way, Phenom II K10 bypasses it here too);

Wins in the majority in multi-threaded tests, where the load is distributed evenly across all 8 cores;

Support for AES-NI cryptographic instructions allows you to get closer to the Core I7 2600K;

Unfortunately, the 3D test results are also disappointing, the FX-8150 lagging behind its competitors;

In games, the AMD FX-8150 processor is inferior to the Core i5 2500k, even in those games in which the load of all processor cores is declared.

Although AMD FX Bulldozer processors are inferior to their competitors in terms of performance, they have good potential for the future. The problem is not that AMD engineers failed to achieve their goal. It was planned that the basis for the high performance of processors based on the new microarchitecture would be realized through a larger number of cores operating at higher frequencies. But during the implementation of the Bulldozer idea into flint, difficulties arose and the AMD FX CPUs that saw the light at a sufficient clock frequency did not work. This resulted in a small number of instructions executed by each individual core, and this, in turn, reduced performance in general. Even the installed 8 cores on the AMD FX-8150 could not compensate for this negative effect.

This explains that during a multi-threaded load, the eight-core AMD FX-8150 turned out to be at the level of a 4-core processor from Intel Core i5, and with a single-threaded load it shows very mediocre results.

But AMD is working on the bugs and the second generation of Piledriver bulldozers will soon appear, which gives hope for a more successful product. From the stated information of AMD employees, the new generation of processors will have a performance higher by 40-50% compared to the FX-8150, and the “standard frequency” will be 30 times higher than the current one.

As for the issue of buying a Bulldozer processor for games, this decision does not look very good against the background of the lack of superiority over Intel processors and in view of AMD's pricing policy.

For narrow-profile multi-threaded tasks: video processing, rendering, etc. AMD Bulldozer would be a good solution.

Exactly a year ago, we wrote about AMD's new processor microarchitecture, known as Bulldozer. And now, a year later, on October 12, AMD finally announced the AMD FX processor family based on the Bulldozer architecture. Moreover, we got the opportunity to test one of the eight-core processors of the AMD FX family - the AMD FX-8100 processor. So, let's take a closer look at the new AMD processors.

General information

AMD's official press release for the release of AMD FX processors notes that this is a family of fully unlocked and customizable desktop processors using AMD's new multi-core architecture (codenamed Bulldozer).

The AMD FX family includes both eight-core processor models (FX-8000 series), six (FX-6000 series) and quad-core processors (FX-4000 series). All AMD FX processors have an AMD AM3+ processor socket.

AMD FX processors based on the Bulldozer microarchitecture are the first AMD processors to be built using the 32nm process technology.

As you know, AMD plans to release three series of processors based on the Bulldozer microarchitecture with the code names Interlagos, Valencia and Zambezi. The Interlagos and Valencia processors are server processors, while the Zambezi processor is aimed at the desktop market. In this article, we will take a closer look at Zambezi processors.

As follows from the company's press release, one of the main advantages of the new AMD Zambezi processors is their incredible overclocking capabilities. In particular, this is evidenced by the recently set world record for overclocking the eight-core AMD FX processor, recorded in the Guinness Book of Records, and the title of "highest frequency computer processor." Actually, the fact that overclocking capabilities of the processor are extremely important for users is beyond doubt. Nevertheless, it's a little strange to hear this from the lips of AMD representatives. After all, when AMD processors had obvious problems with the clock frequency, representatives of this company at all press conferences stated that the clock frequency is not the main thing and that the processor performance is determined by completely different parameters.

However, the policy of double standards is characteristic not only of AMD - it is a kind of symbol of America. However, let's not criticize American morality, but rather take a closer look at AMD FX processors.

So, according to the official press release, AMD introduced four AMD FX processor models in total: the eight-core FX-8150 and FX-8120 processors, the six-core FX-6100 processor and the quad-core FX-4100 processor ( tab. one). However, one more eight-core processor, FX-8100, can already be found on sale, and soon the company is going to announce also quad-core processors FX-B4150 and FX-4170.

All AMD FX series processors support AMD Turbo Core, a technology that dynamically optimizes performance at the level of processor cores. It is a simplified analogue of Intel Tubo Boost technology, which is used in modern Intel processors. Why are we talking about a simplified analogue of this technology? The fact is that AMD Turbo Core technology implies three modes of operation of the processor: at the nominal frequency, in Turbo Core mode and in MAX Turbo mode. In Turbo Core mode, it is possible to increase the clock frequency by several steps simultaneously for all processor cores, but only if this does not exceed the TDP of the processor. MAX Turbo is a mode in which the clock speed of only half of the processor cores is increased by several steps, while the other half of the cores are disabled (goes into C6 mode). Again, MAX Turbo mode is only possible if the power consumption of the processor does not exceed its TDP.

It is clear that single-threaded applications or applications that cannot load all processor cores can benefit from MAX Turbo mode, while Turbo Core mode is suitable for well-paralleled applications that load all processor cores.

For 2nd generation quad-core Intel Core processors with Tubo Boost technology, the dynamic overclocking mode of the processor cores is more intelligent. If, for example, all four processor cores are loaded, then within the given TDP, the multiplier can be increased by a certain number of steps. If three processor cores are loaded, then the number of steps by which the multiplier increases may be greater. Similarly, when only two processor cores are loaded, the number of steps by which the multiplier increases will become even higher, and the maximum frequency is reached when only one processor core is loaded.

In addition, in the BIOS, you can configure the Tubo Boost mode, that is, set the maximum multipliers for four, three, two, and one active core. You can also set the TDP of the processor, within which the Tubo Boost mode can be implemented.

In the case of AMD processors, the possibilities for dynamic overclocking are much more modest. At the same time, in fairness, we note that using the proprietary AMD OverDrive utility that supports AMD FX processors, the AMD Turbo Core mode, like the entire system as a whole, can be configured over a wide range.

All AMD FX family processors are equipped with an 8MB L3 cache and have an integrated DDR3-1866 (and below) memory controller. In addition, there is a 1 MB L2 cache per core in the AMD FX family of processors. Accordingly, in the case of eight-core processors, the total size of the L2 cache is 8 MB, and in the case of quad-core processors - 4 MB.

AMD Bulldozer Processor Core

We wrote in detail about the features of the AMD Bulldozer microarchitecture exactly a year ago in the article "AMD Bulldozer Processor Microarchitecture" (ComputerPress No. 11'2010), and therefore we will not repeat ourselves and go into details again - we will only recall the most important aspects of the AMD Bulldozer microarchitecture.

Speaking of multi-core processors based on the AMD Bulldoze microarchitecture, it is very important to emphasize that the core in the AMD Bulldozer microarchitecture and processor cores in other microarchitectures are not the same thing. Therefore, it is not entirely correct to compare, for example, AMD FX (Zambezi) processors with Intel Cote i3/i5/i7 (Sandy Bridge) processors by the number of cores. The fact is that AMD processors based on the AMD Bulldozer microarchitecture provide for a modular architecture. Each module itself (in AMD terminology) is dual-core. For example, an eight-core Zambezi processor contains four dual-core modules (Figure 1).

Rice. 1. Block diagram of an eight-core Zambezi processor

However, what the company calls the core in this case, in fact, falls short of a real processor core. Actually, here the whole trick is in terminology. A module in which two cores are located could well be called a core, and the cores themselves could be called computational integer clusters. That is, in our opinion, it is more correct to speak not about a module with two cores, but about a core with two computational integer clusters. Of course, each such module in the processor will be perceived by the operating system as two separate cores, but after all, each core of an Intel processor with Hyper-Threading technology is perceived by the operating system as two separate cores, and we are talking about one core capable of simultaneously processing two threads.

However, let's leave the peculiarities of terminology. The main thing to remember is that in the case of an AMD module, we are not talking about true two cores, but about some kind of solution that can simultaneously process two threads. Moreover, in terms of efficiency, such a dual-core AMD module outperforms a single Intel core with Hyper-Threading support, but is inferior in terms of dual-thread processing efficiency to two separate true cores.

Now let's see why you can't put an equal sign between dual-core AMD modules and two true cores.

First of all, in each AMD pseudo-dual-core module, some of the resources are shared between both pseudo-cores. In particular, in the AMD module, the preprocessor responsible for fetching instructions from the L1I instruction cache, their decoding and promotion to execution units, as well as the L1I instruction cache and L2 cache are shared between both pseudo-cores (Fig. 2). In addition, the pseudo-kernels of the AMD dual-core module themselves have only integer execution pipelines, and for working with real data they use an FP cluster shared at the module level. This is reminiscent of when the x86 CPU was augmented with an x87 coprocessor to perform floating point arithmetic. And although AMD itself does not call this FP execution cluster a coprocessor, in fact it is a coprocessor shared between two cores that can only perform integer operations.

Rice. 2. Block diagram of the dual-core module
in AMD Bulldozer processor microarchitecture

If each processor module in the AMD Bulldozer microarchitecture has a shared L2 cache between two cores, then the L3 cache is shared between all processor modules.

AMD 9-series chipsets

Long before the announcement of the Zambezi processors, AMD announced the AMD 9-series chipsets, which, although compatible with all AMD processors with the AM3 + socket, are focused specifically on the new AMD FX processors.

The AMD 9-series chipsets are the basis for the platform known as AMD Scorpius. In addition to the AMD 9-series chipsets, the AMD Scorpius platform is based on the Zambezi processor, as well as the AMD Radeon HD 6000 series discrete graphics card.

AMD 9-series chipsets support both the new AMD Socket AM3+ and the old Socket AM3. That is, motherboards based on the AMD 9-series chipset are compatible not only with the new Zambezi processors, but also with processors of the previous generation of the Phenom II family with an AMD Socket AM3 socket.

To some extent, AMD 9-series chipsets are an improvement over AMD 8-series chipsets, providing more features. Recall that the new Zambezi processors with Socket AM3 + are theoretically compatible with AMD 8-series chipsets, however, in this case, not all functionality Zambezi processors can be implemented.

AMD 990FX (chip code name RD990), AMD 990X (chip code name RD990) and AMD 970 (chip code name RD970) are currently the 9th series of AMD chipsets. All three chipsets support the new 942-pin Socket AM3+ and are based on the 65nm process. All AMD 9-series chipsets have an IOMMU (Input/Output Memory Management Unit) for I/O operations.

Just like the traditional processor memory management unit (MMU), which translates processor-seen virtual addresses to physical addresses, the IOMMU translates hardware-seen virtual addresses to physical addresses.

AMD 9-series chipsets are connected to the processor via the traditional HyperTransport bus. At the same time, all chipsets support the HyperTransport 3.1 bus with a bandwidth of up to 6.4 GT/s.

The top model AMD 990FX supports 42 PCI Express 2.0 lanes, which are distributed as follows: 32 PCI Express 2.0 lanes can be grouped into two PCI Express 2.0 x16 ports or four PCI Express 2.0 x8 ports, the remaining ten lanes can be grouped into one PCI port Express 2.0 x4 and six PCI Express 2.0 x1 ports, or can be used by controllers integrated on the board.

Naturally, boards based on the top-end AMD 990FX chipset support the CrossFireX technology for combining discrete video cards in the mode of two or four PCI Express x16 slots.

The AMD 990X chipset differs from the AMD 990FX just in the number of supported PCI Express 2.0 lanes. This chipset provides 26 PCI Express 2.0 lanes, but only 16 PCI Express 2.0 lanes can be used to organize one PCI Express 2.0 x16 port or two PCI Express 2.0 x8 ports. The remaining PCI Express 2.0 lanes can be grouped into one PCI Express 2.0 x4 port and six PCI Express 2.0 x1 ports, or can be used by controllers integrated on the board. The AMD 990X chipset, like its older brother AMD 990FX, supports CrossFireX technology in the mode of two PCI Express x16 slots.

Boards based on the junior AMD 970 chipset can have only one PCI Express 2.0 x16 slot and do not support CrossFireX technology.

As a matter of fact, the functionality of AMD 990FX, 990X and 970 chips, which are the northbridges of the corresponding chipsets, is limited only by their support for PCI Express 2.0 lanes. All other functionality of the chipset is concentrated in the southbridge. The north and south bridges use the A-Link Express III bus with a bandwidth of 4 GB/s (equivalent to the bandwidth of the PCI-Express 2.0 x4 bus).

Theoretically, the AMD 990FX, 990X, and 970 northbridges are compatible with the SB710, SB750, SB810, SB850, SB920, and SB950 southbridges. South bridges SB710, SB750, SB810 and SB850 are not new and have been used for a long time. But the SB920 and SB950 bridges are specially designed for AMD 9-series chipsets.

The SB920 and SB950 southbridges support up to 14 USB 2.0 ports, a PCI bus, and six SATA 6Gb/s (SATA III) ports. The SB950 bridge supports RAID levels 0, 1, 5, and 10, while the SB920 only supports RAID levels 0, 1, and 10. Another difference between the SB920 and SB950 bridges is that the SB950 bridge supports four PCI Express 2.0 x1 lanes. , and the SB920 bridge is only two such lines.

Naturally, the SB920 and SB950 bridges support HD Audio and Gigabit Ethernet.

Note that the power consumption of the SB920 and SB950 chips is 5W; the power consumption of AMD 990FX Northbridge is 19.6W, AMD 990X Northbridge is 14W, and AMD 970 Northbridge is 13.6W.

The GIGABYTE GA-990FXA-UD7 board is based on the new top AMD 990FX chipset coupled with the AMD SB950 southbridge. It has an ATX form factor (30.5x26.3 cm) and can be used to create gaming and high-performance computers. The board is designed for AMD FX (Zambezi) processors with AM3+ socket, but is also compatible with AMD Phenom II and Athlon II processor families with AM3 socket.

The board provides four DIMM slots for installing memory modules, which allows you to install up to two DDR3 memory modules on each of the two memory channels. In total, the board supports up to 32 GB of memory (chipset specification), and it is optimal to use two or four memory modules with it. Note that in normal mode the board supports DDR3-1866, DDR3-1600, DDR3-1333 and DDR3-1066 memory, and in overclocked mode it also supports DDR3-2000 memory.

To install video cards and other expansion cards, the GIGABYTE GA-990FXA-UD7 has six slots with the PCI Express 2.0 x16 form factor, but, of course, not all of them work at x16 speed.

The AMD 990FX chipset (northbridge) supports 42 PCI Express 2.0 lanes, which are distributed as follows: 32 PCI Express 2.0 lanes can be grouped into two PCI Express 2.0 x16 ports or four PCI Express 2.0 x8 ports. The remaining ten lines can be grouped into PCI Express 2.0 x4 and PCI Express 2.0 x1 ports or used by controllers integrated on the board.

Actually, on the GIGABYTE GA-990FXA-UD7 board, 32 PCI Express 2.0 lanes supported by the chipset are used to organize four slots with the PCI Express 2.0 x16 form factor. Moreover, if only two of these slots are used, then they operate at x16 speed, and when all four or three ports are used at the same time, they switch to x8 speed mode.

Two more PCI Express 2.0 x16 ports always work in x4 speed mode. Thus, in total, 40 PCI Express 2.0 lanes supported by the AMD 990FX northbridge are used to organize six slots with the PCI Express 2.0 x16 form factor.

Naturally, the GIGABYTE GA-990FXA-UD7 board supports CrossFireX technology for combining discrete video cards in the mode of two, three or four PCI Express x16 slots, as well as NVIDIA SLI technology.

In addition to the mentioned slots with the PCI Express 2.0 x16 form factor, the board has one traditional PCI slot, which is implemented on a PCI bus supported by the AMD SB950 south bridge. The PCI bus is also used by the VIA VT6308 FireWire controller, which provides the user with two IEEE-1394a ports, one of which is brought to the back panel of the board, and the other can be brought to the back of the PC by connecting the corresponding die to the connector on the board.

To connect hard drives The GIGABYTE GA-990FXA-UD7 board has eight internal and two external SATA ports.

First, there are six SATA 6 Gb / s ports implemented through the AMD SB950 SATA controller integrated into the south bridge. These ports support RAID levels 0, 1, 10, and 5.

Secondly, the board integrates two dual-port SATA 6 Gb / s controllers Marvell 88SE9172, through one of which two internal SATA 6 Gb / s ports are implemented with the ability to organize RAID arrays of levels 0 and 1, and through the other - two external ports eSATA 6 Gb /s (one of them shared with USB). Note that one of the Marvell 88SE9128 controllers uses the PCI Express 2.0 line supported by the AMD 990FX northbridge, and the other uses the PCI Express 2.0 line supported by the AMD SB950 southbridge (the AMD SB950 southbridge supports four PCI Express 2.0 lanes in total).

To connect a variety of peripherals, the GIGABYTE GA-990FXA-UD7 board has 18 USB ports. The AMD SB950 Southbridge provides 14 traditional USB 2.0 ports (the SB950 Southbridge supports up to 14 USB 2.0 ports), eight of which (including the eSATA/USB combo port) are routed to the rear panel of the board, and another six ports can be routed to the back of the PC by connecting the appropriate plugs to the connectors on the board.

In addition, two Etron EJ168 dual-port USB 3.0 controllers are integrated on the board, two ports are routed to the back panel of the board, and two more can be brought to the back of the PC by connecting the corresponding die to the connector on the board.

Note that one of the Etron EJ168 controllers uses the PCI Express 2.0 lane supported by the AMD 990FX northbridge, and the other uses the PCI Express 2.0 lane supported by the AMD SB950 southbridge.

The audio subsystem of this motherboard is based on the Realtek ALC889 HD audio codec. Accordingly, on the back side of the motherboard there are six mini-jack audio connectors, as well as coaxial and optical SPDIF connectors (outputs).

The board integrates a Realtek RTL8111E gigabit network controller, which occupies one PCI Express 2.0 lane supported by the AMD SB950 south bridge.

If we count the number of controllers integrated on the GIGABYTE GA-990FXA-UD7 board that use the PCI Express 2.0 bus, it turns out that there are five of them in total. Indeed, the PCI Express 2.0 bus is powered by two Marvell 88SE9172 controllers, two Etron EJ168 controllers, and a Realtek RTL8111E controller. At the same time, out of the four PCI Express 2.0 lanes supported by the AMD SB950 south bridge, three are used, and out of the ten remaining PCI Express lanes in the AMD 990FX north bridge - all ten (two PCI Express 2.0 x4 slots and two controllers).

The cooling system of the GIGABYTE GA-990FXA-UD7 board consists of three radiators connected to each other by a heat pipe. One heatsink covers the MOSFETs located near the CPU socket, another is installed on the AMD 990FX northbridge, and the third covers the AMD SB950 southbridge.

Note also that the board has two four-pin and two three-pin connectors for connecting fans.

The GIGABYTE GA-990FXA-UD7 board uses a 10-phase (8+2) CPU voltage regulator based on the Intersil ISL6330 control controller and DrMOS technology, when a pair of MOSFET transistors and a driver chip for these transistors are integrated into one DrMOS SiC769CD chip.

AMD FX-8100 Processor Performance

In conclusion of our review, we present the results of testing the eight-core processor AMD FX-8100. The processor, of course, is not top-end, but it is quite suitable in order to get an idea of \u200b\u200bthe performance of AMD FX processors.

To test the AMD FX-8100 processor, we used the stand with the following configuration:

motherboard - GIGABYTE GA-990FXA-UD7;
chipset - AMD 990FX + SB950;
video driver - ForceWare 280.26
memory - DDR3-1333;
memory size - 4 GB;
hard drive - WD1002FBYS;

For testing, we used our new test script ComputerPress Benchmark Script v.10.0, detailed description which can be found in this issue of the magazine.

In addition to the AMD FX-8100 processor, we tested Intel Core i7-2600K, Intel Core i5-2500K and Intel Core i5-2400 processors. Them specifications presented in tab. 2.

To test Intel processors, we used the stand with the following configuration:

motherboard - FOXCONN Z68A-A;
chipset - Intel Z68 Express;
video card - NVIDIA GeForce GTX 590;
video driver - ForceWare 280.26;
memory - DDR3-1333;
memory size - 4 GB;
memory operation mode - dual-channel;
hard drive - WD1002FBYS;
operating system - Windows 7 Ultimate (64-bit).

Conclusion

In terms of cost, the AMD FX-8100 processor should be slightly more expensive than the Intel Core i5-2400 processor and slightly cheaper than the Intel Core i5-2500K processor, so its comparison with these processors is quite appropriate.

In addition, we recall that we used an ASUS G53SX laptop with an Intel Core i7-2630QM processor (base clock speed 2 GHz; maximum clock speed in Turbo Boost mode 2.9 GHz) in conjunction with the Intel HM65 Express chipset as a reference system in our testing. , 8 GB DDR3-1333 memory, as well as an NVIDIA GeForce GTX 560M discrete graphics card and a HITACHI HTS547564A9E384 (640 GB) hard drive.

So, if you look at the results of testing the eight-core AMD FX-8100 processor, you can only draw one conclusion: it didn’t work ( tab. 3). Well, AMD failed to make a high-performance, competitive processor. Eight AMD pseudo-cores outright lose to four true Intel cores. And it turns out that a desktop PC equipped with a powerful graphics card and an eight-core AMD FX-8100 processor is actually 16% weaker than a laptop based on a quad-core Intel processor. Probably, any comments in this case are simply meaningless.

If we compare the AMD FX-8100 processor with the processors of the Sandy Bridge family, then the situation is as follows. It is inferior in performance to the Intel Core i7-2600K processor by as much as 54%, and Intel processors Core i5-2500K and Intel Core i5-2400 - 46% and 37% respectively.

In general, AMD has a rather strange trend: the company manages to make each next processor a little worse than the previous one. The only question is who needs such processors.

So, the AMD Bulldozer microarchitecture did not justify itself. And the saddest thing is that it turned out to be uncompetitive already at the time of its release, and in fact it will become the basis for AMD processors over the next few years. Well, we can only regret that we were deprived of the pleasure of watching an exciting duel between Intel and AMD.