As you might know, PCIe devices can do DMA operations. Now, imagine that a PCIe device is installed into the target system to examine a malware (not sure if this is a stupid idea btw). A really high-end malware can hide its operations if a rogue PCIe device is installed. DMA operations can be detected by examining (AFAIK):
[ul]
[li]hardware performance counters[/li]
[li]increased PCIe bus activity[/li]
[li]increased interrupt signals[/li]
[li]or by simply checking the existence of a “rogue” device. I mean, the device and vendor ID can be spoofed by flashing the device firmware but a legit looking driver (with all the digital certificates (code signing certificates) from big companies such as Intel, nVidia, AMD, Qualcomm etc) cannot be installed for the device in the operating system.[/li][/ul]
These are not the only detection vectors, I guess. Learning those detection vectors is one of the reason that I create this thread. These detection vectors might be bypassed by overcoming timing attacks explained in the following research papers (i.e. extremely interesting resources about detecting hardware level malware):
TL;DR; So, my question is: Can BIOS theoretically and practically hide the existence of PCIe DMA device from the operating system? The device will enumerated by BIOS just fine and normally at boot but some mechanism built into BIOS will prevent the device from be visible to the operating system. Is this possible?
that doesn’t render it invisible, though, it just means the OS won’t load the driver(s) for the device. which also means the device will not function. I think OP wants to know if it’s possible to have a device sitting on the PCI bus, doing something but able to conceal its presence.
Is the question whether you can make it impossible to detect the device or whether modern operating systems can be fooled?
Because if it’s the later, snooping on memory with a device driver is easy street. Many game hacks are doing it.
No need for DMA tricks, device drivers can just openly read arbitrary memory addresses with CPU instructions.
But nothing is impossible. Even “red pill” attacks, where the BIOS spoofs all the incriminating evidence the OS could read, can be detected with careful timing measurements.
Yes but the entire read and write operation can be done without a kernel mode (device driver). For example, an FPGA PCIe device can be programmed to read and write memory without a driver.
If you have access to the hardware, then you can easily hide things from the OS. That has pretty much always been true.
If you write a custom BIOS, you can have it hide devices from the OS by simply not putting them into the enumeration lists when the OS queries the BIOS for devices.
If you don’t have a custom BIOS, you can instead fake out the device IDs and such so that the PCI BIOS will think nothing is there (I can’t remember if it is all 0s or all FFs but one or the other will definitely do it). From the PC’s point of view the device can never work since it cannot be addressed either by memory or I/O. But on any computer bus that supports bus mastering (having a slave device take control of the bus), the slave device can then issue memory and I/O commands once it has control of the bus.
I haven’t looked up the details on PCIe bus mastering, but on PCI it’s relatively easy to do.
You could catch a device like this in operation by using a data analyzer and triggering on the bus control lines, so it’s not possible to completely hide it, but the OS wouldn’t have a clue that anything was going on.
If you are monkeying with hardware, you could also replace the PC’s RAM with dual ported RAM, and have your whiz-bang spiffy snooper device / malware inserter directly write to the RAM completely outside of the PC’s CPU and buses. Then you’d have complete access to the PC’s memory and nothing on the PC side of things could possibly detect it.
I’m not sure what the point of this is, since anything in this discussion involves hardware modifications to the device in question. There’s no malware threat to any normal PC from any of this.
As I understand it the hypervisor doing the attack is actually running at Ring 0. And any calls the OS make that would be affected by which ring it’s in get intercepted and the response faked, so the correct response is received (but more slowly because of all the extra code to emulate the call)
doesn’t the device still have to assert an interrupt before it can become the bus master? I’m still struggling to think of how a PCI/PCIe device could sit on the bus, take control of it so it can read/write data, and do that without the CPU/OS noticing. Or corrupting a bunch of stuff while it tries to “talk out of turn.”
Like I said, I haven’t looked up how PCIe handles bus mastership, but PCI has bus request and bus grant signals. Whoever is attempting to use the bus asserts its PCI request signal (not an interrupt), and the PCI arbiter decides who actually gets the bus. The PCI arbiter is usually part of the PCI chipset and the CPU is generally not involved. Like I said, there’s no interrupt, so the CPU is completely unaware of what is going on.
The bus request and bus grant signals are REQ and GNT (18B and 17A) on this diagram:
The interrupt pins are INTA, INTB, INTC, and INTD (6A through 8B). Bus grant and request are a different thing than interrupts.
The CPU can’t initiate a PCI bus cycle until its bus grant signal is asserted. If two devices try to initiate a bus cycle simultaneously, only one of them gets a bus grant signal. The other device has to wait its turn. This is what prevents data corruption.
And since this happens at the bus and the arbiter, the CPU isn’t involved and doesn’t have a clue about it. If you trigger on the arbiter’s REQ and GNT lines with a data analyzer you can see when the PCI device takes control of the bus, otherwise, other than things maybe getting a bit slower if the device is initiating a bunch of PCI data transfers, you’re not going to see it at all from the CPU side of things.
I have worked with some Intel Ethernet chips which do this sort of thing. When you set up the Ethernet chips, you tell it where in the PC’s memory you want it to store incoming messages and where you want it to get outbound messages from. When an Ethernet message comes into the chip, the chip automatically takes bus mastership, transfers the data into the ring buffer (where you told it to put the message in the PC’s RAM), and only after the transfer is complete does the Ethernet chip bother to give you an interrupt to let the OS know that there’s something it needs to do now. The OS just needs to look at the ring buffer in RAM. It doesn’t even need to access the Ethernet chip at that point. Ethernet messages can be coming in and can be transferred into system RAM while the CPU is doing all sorts of other things, like reading and writing to RAM, accessing disk I/O, or whatever. There’s no data corruption because the PCI arbiter makes sure that things don’t collide.
About the worst thing a PCI device could do is assert its bus REQ signal and then not do anything with it. On older types of buses (like Multibus) you could lock up the system by simply refusing to release bus mastership, but part of the PCI spec is that the arbiter will only wait a certain number of clock cycles, and if the device requesting the bus doesn’t actually initiate a data transfer by then, the arbiter assumes the device is malfunctioning and takes its bus grant away from it.
How the arbiter does its thing isn’t part of the PCI spec (IIRC), but I believe that most of them do some sort of round robin type of thing if multiple devices keep trying to assert bus mastership, so that everyone gets a turn eventually.
I realize this is a bit technical, so if anyone wants something explained a bit more, just ask.
Have the device hide in plain sight. Take some common PCI-e device, like a network controller or sound chip. Reimplement the external interface, via reverse engineering or just reading open driver implementations. But at the same time, build in whatever HW spy stuff you need.
The OS won’t know the difference, and without a bus analyzer, neither will the user.
Some devices have extra security protections, making it more difficult to spoof yourself. But I am certain there are hundreds of devices with legitimate, signed, and auto-installing device drivers with no such checks.
This doesn’t require nation-state resources. I’m not certain about PCI-e, but counterfeit USB devices that work just like the real thing are extremely common. FTDI USB->serial converter chips are one that I’ve personally run into.
It is very common for malware to detect the execution environment to take action to hide it’s activity. In particular, many malware implementations change their behaviour or stop operating if they detect they are running in a virtual machine - infecting a vm is a common isolation technique used by malware researchers. Similarly, some malware can detect memory dumping tools, and either manipulate them or hide to prevent memory analysis from obtaining running executable code.
MaverocK seems to be asking whether a PCIe memory analyser could be constructed that can read arbitrary system memory without alerting running malware to it’s presence by hiding from the OS device list and still reading system memory.
From the description by engineer_comp_geek, it certainly looks possible, although how fast you can dump memory would remain to be seen.
As noted, dual-ported memory also would provide a visibility window into the system memory that would probably be invisible to running malware, and might be easier to implement - I’m not a hardware engineer. However, it may also be susceptible to timing-based detection, as I am sure you would need some form of contention management.
Bolded is the critical part as to whether this would actually be particularly useful for hacking.
Are you allowed to claim anywhere in the address space for your DMA buffer or are you limited to specific addresses that are pre-authorized by software in the boot chain?
I’m only knowledgeable about microcontroller DMA, which is limited to specific addresses on the chips I have used it on.
I think hypervisor is considered Ring-1. Timing attacks in order to detect hypervisor presence is described in research papers but I am not sure if someone have been able to implement it without false positives. There are people who have been using hypervisors to evade malware or anti-cheat detections. One of the people told me he used QEMU KVM (but a patched version).
However, I am not sure how this hypervisor talk is relevant to the detection of BIOS patching with the aim of hiding a device from the operating system. Sorry, I didn’t understand you fully.
In my experience, it’s utterly trivial to detect. Call QueryPerformanceCounter (rdtsc) and see how long it takes (wall clock). On a native system, it takes nanoseconds. On a virtualized system, it takes microseconds. It’s so slow that we’ve had to implement workarounds for our device driver.
This is definitely an interesting hypothetical. If a PCIe device returned all ones for its device ID and vendor ID, it would appear to software as if was no device there at all. What I’m not sure about is whether the PCIe hardware would allow such a device to perform DMA or not. By default, PCIe devices do have access to the entire memory space.
An OS should be able to defeat such a device through the use of the I/O Memory Management Unit (IOMMU). The IOMMU acts like the MMU, but instead of putting process in a virtual memory space it places I/O devices like PCIe devices in a virtual memory space. The intended use-case is for allowing virtual machines to run drivers against physical devices without allowing them to DMA to or from the hypervisor’s memory, but it is also possible to use it to prevent erroneous DMA from a device. What I’m not sure about is how the interface actually works. The OS may have to enable the IOMMU on a device-by-device basis, in which case it has to know which devices are present (or preemptively place enable the IOMMU on all 2**16 possible PCIe devices)
I am not an export on this so I have to ask. Does it apply to bare-metal hypervisors? Can you detect bare-metal hypervisor via QueryPerformanceCounter?
I’m by no means an expert, either–I’ve just seen certain bug reports fly by.
My understanding is that the rdtsc instruction (“read timestamp counter”) is trapped because the values can have different offsets between VM instances, or require fixing after being paused or migrated between different machines.
I don’t have a great understanding about exactly what’s meant by a bare-metal hypervisor. To me, bare-metal means to hypervisor at all. In that case, rdtsc is read directly from the on-chip counter and has almost zero overhead.
Maybe it’s possible to configure a VM instance in a way where there’s only one on a given machine, and it doesn’t support migration or any other fancy stuff. In that case, there would be no need to trap rdtsc and so maybe it could be disabled.
Also–and again, I’m not an expert, either–I get the impression that newer CPUs have native support for a virtualized rdtsc instruction, which presumably means there is some offset value that gets context-switched between instances. If true, you could get native speed in that case.
I am replying to this thread after a while. We tested with my friends by setting the device and vendor ID as 0xFFFF. The device literally won’t appear to the OS. The device manager won’t show it. SIV (System Information Viewer http://rh-software.com/) also won’t show it.
Thank you so much for sharing valuable information. This is an excellent information.