Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support fence.i #21

Open
wants to merge 2 commits into
base: riscv64-linux
Choose a base branch
from

Conversation

mingyuan-xia
Copy link

@mingyuan-xia mingyuan-xia commented Aug 14, 2024

#19 partially
sorry for the inconsistent commit log, should be:
riscv64: Support fence.i
The project-level README is left unmodified.

@mingyuan-xia mingyuan-xia changed the title support fence.i [WIP] support fence.i Aug 15, 2024
@mingyuan-xia
Copy link
Author

Note: fence.i is kind of equivalent to ARM's isb. ARM's toIR emits an Imbe_Fence node upon isb so I did the same for fence.i.
ref: https://github.com/petrpavlu/valgrind-riscv64/blob/riscv64-linux/VEX/priv/guest_arm_toIR.c#L16012
Yet, I am not totally sure, as fence.i is stronger than Imbe_Fence's semantic specified in VEX IR.

Additionally, a quick search shows that the unprivileged fence.i is not common in userspace programs, even for a JIT like OpenJDK, see: openjdk/jdk#9770

@mingyuan-xia mingyuan-xia changed the title [WIP] support fence.i support fence.i Aug 15, 2024
@milkybird98
Copy link

In the ARM backend, the Imbe_Fence node corresponds to three barrier instructions, including an ISB instruction to ensure instruction synchronization, similar to what fence.i did:

case ARM64in_MFence: {
*p++ = 0xD5033F9F; /* DSB sy */
*p++ = 0xD5033FBF; /* DMB sy */
*p++ = 0xD5033FDF; /* ISB */
goto done;
}

This means that the ARM backend has strengthened the semantics of the Imbe_Fence node.

However, in the RISC-V backend, the Imbe_Fence node corresponds to a single fence instruction for all types of device I/O and memory accesses.

case RISCV64in_FENCE: {
/* fence */
p = emit_I(p, 0b0001111, 0b00000, 0b000, 0b00000, 0b000011111111);
goto done;
}

We might consider doing something similar, or perhaps introduce a new Imbe_Fence_I? node with correct semantics.

@petrpavlu
Copy link
Owner

Additionally, a quick search shows that the unprivileged fence.i is not common in userspace programs, even for a JIT like OpenJDK, see: openjdk/jdk#9770

Right, I wouldn't expect to see FENCE.I used in userspace applications. The RISC-V Instruction Set Manual Volume I: Unprivileged Architecture has a note on it:

Because FENCE.I only orders stores with a hart’s own instruction fetches, application code should only rely upon FENCE.I if the application thread will not be migrated to a different hart. The EEI can provide mechanisms for efficient multiprocessor instruction-stream synchronization.

In the ARM backend, the Imbe_Fence node corresponds to three barrier instructions, including an ISB instruction to ensure instruction synchronization, similar to what fence.i did:
[...]
We might consider doing something similar, or perhaps introduce a new Imbe_Fence_I? node with correct semantics.

I think that from the perspective of the VCPU that Valgrind implements, FENCE.I would effectively need to result in discarding all translations. Something as Ijk_InvalICache with guest_CMSTART and guest_CMLEN covering the whole address range.

@mingyuan-xia
Copy link
Author

SMC (self-modifying code, somewhat seen in malware/injected code/thunk) is one more use case of fence.i, which changes the code near PC. But again, I am not convinced to support such cases with great efforts (changing backend/incurring great runtime overhead for other cases).

Based on the very rare use cases listed, I would recommend emitting an Imbe_Fence with a warning. Consider to run instead of crash on some random fence.i-s emitted by an old compiler/JIT, like the OpenJDK above-mentioned. After all, fence.i is a standardized instruction allowable in userspace. Would like to hear your opinion on this @petrpavlu.

@petrpavlu
Copy link
Owner

I'm somewhat in favor of implementing this instruction properly if it is added. Turning it into Ijk_InvalICache should be fairly simple, without a need for any backend changes.

Userspace programs shouldn't really need to use this instruction, but instead need to invoke the riscv_flush_icache() syscall. IMO the instruction is then not worth optimizing for and I wouldn't worry about the penalty incurred by Ijk_InvalICache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants