Optimize AttributeBuffer to OutputVertex conversion
First I unrolled the inner loop, then I pushed semantics validation
outside of the hotloop.
I also added overflow slots to avoid conditional branches.
Super Mario 3D Land's intro runs at almost full speed when compiled with
Clang, and theres a noticible speed increase in MSVC. GCC hasn't been
tested but I'm confident in its ability to optimize this code.
Most applications call AcquireRight before calling RegisterInterruptRelayQueue so we can't assign the thread id there.
This fixes the bug with LLE applets not launching properly.
This is true for all interrupts except PDC0 and PDC1, which should be triggered for all registered threads.
TODO: The real GSP module seems to only trigger PDC0 after updating the screens (both top and bottom). PDC1 doesn't seem to be triggered at all.
The registered interrupt event is unique to each session that calls RegisterInterruptRelayQueue, and only that event should be reset when UnregisterInterruptRelayQueue is called.
Several games such as Smash will cause some regions that are cached on
the gpu to be revalidated, but (seemingly) we can just ignore these
cases. If the data is already found on the gpu in dirty_regions, then we
validate those, and skip flushing that region from cpu.
Its unknown if this breaks any games, but it does speed up many games.
Additionally, it removes outlines in the pokemon games.
The previous commits added the methods where they were located
originally to try to get an easy to read diff between changes. This
commit fixes compliation since the static methods are now declared
before they are used.
Changes the public interface of the surface cache to make it easier to
use. Reintroduces the cached page count cached pages that was removed in
an earlier commit.
Breaks CachedSurface into two classes, the parameters used to create or
find a cached surface, and the actual cached surface. This also adds a
few helper methods for getting surfaces from cache
In a future commit, the count of cached pages will be reintroduced in
the actual surface cache. Also adds an Invalidate only to the cache
which marks a region as invalid in order to try to avoid a costly flush
from 3ds memory
The only functional change is the error handling of GSP_GPU::ReadHWRegs function. We previously didn't return error codes (not even for success). The new returns were found by reverse engineering the GSP module.