Site icon TimmyIT.com

Building a local AI workstation with Dual AMD AI Pro R9700 32GB – Part 1: Hardware

Intro

I’ve spent the last months researching and building a dedicated inference workstation that can handle some serious local AI tasks. Running Ubuntu, dual AMD AI Pro R9700 32GB graphic cards, AMD Ryzen 7950X CPU and 128GB of RAM. After a lot of trial & error and learning about ROCm, Ollama, vLLM, Ubuntu, Cline, Continue, Open webUI and more, it finally works. Now I have a local inference LLM server that can be used to handle workloads I want to run locally and not to be reliant on cloud services. Thats the quick and brief story.

Lets start from the beginning.

I love tinkering with hardware, always have and most likely always will. Building things and solving problems is something that brings me joy and satisfaction and whats most interesting is that sometimes you don’t know at the beginning what the outcome or result will be and this is one of those times.

I get it, AI this and AI that. All the talk of AI is exhausting and sometimes annoying to hear about. Its similar to blockchain a couple of years ago where everything was about blockchain and everyone had to do something with it or cloud before that. And there are thousands of examples prior to that. As an IT Professional, I need to dip my toes in the water and see if its hot, freezing or something in between. These things comes in cycles, however it does not necessary mean that its not worth looking at or that there’s nothing good or positive with it. New technology and innovations wont solve all problems but it can solve some problems. Its just a matter of finding out what those are and not get lost in all the marketing and hype-cycles.

In this series of articles I want to share some real world examples I’ve been using to help me with my work. This is not a build guide nor a “how to”. Just my journey and experience. If you find value in that, thats great. In this post I will focus on the hardware side of things and in the next one I will focus more on software.

Let’s get into it.

Why local AI / LLMs?

In my opinion, there are use cases for running services in the cloud (i.e., someone else’s datacenter) but there is also use cases for local instances. Cloud services is often easy to get going with and for that ease of use you also need to pay for of course. With AI workloads in the cloud its obvious that the cheap prices we saw at the start will not continue and can not continue based on the cost of running all of that infrastructure. What something is worth comes down to what value it generates and for some the value is worth the spend. cloud services as of right now has a lower cost to get started with compared to buying the hardware yourself, with that said once you know what you want or need then you can take that in to calculation. Its also important to understand that the use case for a normal person vs a business is different and also what would be considered valuable. In this series Im talking from the perspective of a IT Professional and a small business owner that have workloads that could benefit from local AI and the cost of investment that comes with that.

Im a IT Professional and consultant, I work with device management, security, automation and AI is starting to be part of the standard toolset used to perform certain tasks. I also run a business that sells used and refurbished electronics. Laptops, gaming computers, cameras, computer hardware and much more. I’ve also found a few use cases for AI there as well.

Local AI is not all or nothing, for me I use both cloud services and local AI in a hybrid scenario. Where certain tasks and objectives can be done in the cloud and others done locally. For example I don’t share sensitive or confidential information to an AI in the cloud, if needed I then use the local AI to help me with tasks that involves sensitive or confidential information. But its not always about data security. If i get a wild idea for a project I always start out locally first and depending on the out come I either continue locally or I try a cloud service. It could also be that I start with a draft in the cloud but I later let the local LLM do the heavy compute or build the rest of the project to save tokens and cost. Finding efficient ways of using both locally and cloud models have almost become a game to me.

Btop resource monitor showing high GPU utilization.

Hardware

Before we dive in to the actual hardware I ended up using I think its important to state that this is not where I started out. There was a long process and decisions needed along the way to get to where I ended up. As many just starting out with local LLMs it starts with the device they are using daily or a spare one. In my case it started with my M5 Macbook Pro that has 24GBs of unified memory, amount of memory and speed is important when it comes to running LLMs (very simplified). However I did not want it to run locally on the machine I was using and needed a way to offload it to another system. This was also the time I started to look closer at AMDs offering for AI compute and since I had a system with an AMD RX 9700 XT 16GB this was my next logical step.

My daily driver. Macbook Pro M5

I started doing test on this other Windows system running the AMD Radeon RX 9700 XT and found that all the use cases I threw at it did work. Because of the 16GB limit on the RX 9700 XT I could not run the larger models I wanted to run but it gave me a good baseline on what to expect in form of performance with the smaller models. I also learned that the the applications I wanted to run and develop could run with the AMD stack.

The system I ended up reusing is one I’ve run a lot of VMs on for a couple of years as part of my lab environment. However, I’ve been cutting down the number of VMs lately in a VM death cleaning exercise. Once you have the hardware to run a lot of VMs it felt like every excuse I had to create a new VM I created one and over time you I ended up with way more than I actually needed.
I now have a smaller ITX build where I run VMs on which takes up less space and is more suitable. I might do a article on that system one day.

Getting back on track to the AI train, the system is reused hardware except for the graphic cards the AMD AI PRO R9700 32GB.

CPUAMD Ryzen 7950x 16 Cores 32 Threads
MotherboardAsus Proart X670E-Creator WIFI
RAM128GB DDR5 (4x32GB) 6000MHz CL30 Corsair Vengeance
GPUs2 x AMD AI Pro R9700 32GB
PowersupplyCorsair RM750 – 750W
ChassiFractal Design Meshify

Another thing worth mentioning is the fact that the PCIe slot runs at 8x when populating 2 cards. This means that the bandwidth between the two cards is half of what a single card running at full x16 would get. On paper that sounds like a big hit but in practice its not as bad as it looks.

PCIe 5.0 x8 still gives you around 32 GB/s per card, which is the same as PCIe 4.0 x16. So its not slow by any historic measurement, just less than the maximum the slot is rated for.

The other part is what the cards actually use that bandwidth for. The PCIe bus is mostly used when a model is loaded from system RAM into VRAM, which is a one time cost per session. After that the cross-card traffic depends on how the workload is split. For something like Ollama where model layers gets distributed across the two cards and each card handles its part in sequence, the traffic between cards is small and x8 is plenty. For tensor parallel workloads like vLLM, where both cards work on the same layer at the same time, the traffic is higher and the x8 limit could in theory become a bottleneck. In my testing so far I have not seen it be a problem, but I also have not pushed the system to its absolute limit.

If you needed the fastest possible token speed on tensor parallel inference, a Threadripper or EPYC board with full x16/x16 would give you more headroom. For my use case x8/x8 on a consumer X670E board is a good trade off between cost and performance.

AMD AI Pro R9700 32GB

What got me interested in the R9700 at first was its price tag. If you have a unlimited budget you can spend some astronomical amounts on hardware in todays market. Nvidia is and has been the obvious choice for many but lets not forget Apple when it comes to local AI. The performance you can get from a Macbook or Mac mini is pretty impressive. Intel have released a few interesting cards as well in their B50, B60 and B70 line up but seems to fall behind software wise at this time so I ended up going with AMD as their software seem more mature compared to Intel but paying a fraction of the cost compared to Nvidia.

My first attempt to buy a R9700 did not go as hoped. I bought my first R9700 through Amazon because they were a bit cheaper than other outlets and also had faster shipping. Price was not the important factor, it was the faster delivery through Amazon. I ordered on a Saturday and the card arrived a few days later.

DOA. Sadly the card was Dead on arrival. During unboxing I noticed that the package had already been opened. This was the first warning sign as the card was bought as brand new. As I was able to hold the card in my hands I started smelling a hint of burned electronics. Ohoooh another warning sign.

I plugged it in my system and nothing… The card did not make any sign of life. No fan spin, no noice, no nothing. The card was never recognized in Linux or Windows. I tried the card in 3 different systems and different Power supplies just to rule out that there was nothing wrong with the systems I had. That meant that I had to return it and order the card from another place and wait a few more days. Everyone who has bought hardware knows that excitement you feel when new stuff arrives and the eager to test it out and the worst feeling is when it does not work.

I don’t know why it was DOA but my best guess is when looking at the evidence is that the card was probably sold to a customer through Amazon, that customer managed to burn the card or it was a defect when they bought it. They then returned it to Amazon and might not disclose that it was a defect or Amazon put it back in stock by mistake even if the customer marked is as a defect. Regardless it sucked for me as a customer and it was not AMDs fault.

I made a order from Proshop.se for 1 card. That card arrived a few days later and worked perfectly! I then ordered a 2nd one from Proshop but by that time they were out of stock but had an estimated delivery coming in the week after. So I just put that order in and the 2nd card arrived a week later.

Installation was easy and I was first a bit worried that a 750W PSU might not be enough for the whole system but in my testing there have been no issues. Both cards uses approximately 200W each under load and as long as the CPU is not getting pushed it should not be a issue.


Spec comparison between R9700 & RX 9700 XT:

SpecRadeon AI PRO R9700 32GBRadeon RX 9070 XT 16GB
ArchitectureRDNA 4 (Navi 48)RDNA 4 (Navi 48)
Compute Units6464
Stream Processors4,0964,096
AI Accelerators128 (2nd gen)128 (2nd gen)
Ray Accelerators64 (3rd gen)64 (3rd gen)
Base / Game / Boost Clock1,620 / 2,350 / 2,920 MHz~2,400 / ~2,970 MHz (AIB OC up to ~3,100+ MHz)
Memory32GB GDDR616GB GDDR6
Memory Bus256-bit256-bit
Memory Speed20 Gbps20 Gbps
Memory Bandwidth640 GB/s640 GB/s
FP32 (vector)47.8 TFLOPS~48.7 TFLOPS
FP16 (vector)95.7 TFLOPS~97.3 TFLOPS
FP16 matrix (dense / sparse)191 / 383 TFLOPS~195 / ~389 TFLOPS
INT8 matrix (dense / sparse)383 / 766 TOPS~389 / ~779 TOPS
INT4 matrix (dense / sparse)766 / 1,531 TOPS~779 / ~1,557 TOPS
PCIe5.0 x165.0 x16
Display outputs4× DisplayPort 2.1a3× DP 2.1a + 1× HDMI 2.1b
TBP300 W304 W
Form factorDual-slot blower (reference)Triple-slot, axial fans (AIB)
LaunchJuly 23, 2025March 6, 2025
MSRP~$1,250–1,300$599

As shown in the comparison, the cards is almost identical with the exception of the amount of memory, 32GB vs 16GB and the MSRP price tag. Double the price for the R9700. It’s also worth highlighting that the R9700 isn’t faster than the RX 9070 XT in AI workloads as they share the same memory bandwidth, which (simplified) is the the main factor for LLM token generation speed. What the R9700 gives you is 32GB of VRAM, which lets us run larger models or hold a bigger context window. That matters for things like coding assistants or analyzing larger datasets.

As for the actual performance and the usecase for the server is something I will cover in later articles. This one was aimed towards the hardware aspect and the performance comes later after we have looked at the Software and use cases.

That’s it for this time, Don’t forget to follow me on X (twitter) @timmyitdotcom , BlueSky @timmyit.com or connect with me on LinkedIn

Exit mobile version