The following code is broken on AMD's proprietary GLSL compiler:
```glsl
uint idx = ...;
vec4 values = ...;
float some_value = values[idx & 3];
```
It index the wrong components, to fix this the following pessimized code
is emitted when that bug is present:
```glsl
uint idx = ...;
vec4 values = ...;
float some_value;
if ((idx & 3) == 0) some_value = values.x;
if ((idx & 3) == 1) some_value = values.y;
if ((idx & 3) == 2) some_value = values.z;
if ((idx & 3) == 3) some_value = values.w;
```
Component indexing on AMD's proprietary driver is broken. This commit adds
a test to detect when we are on a driver that can't successfully manage
component indexing.
It dispatches a dummy draw with just one vertex shader that writes to an
indexed SSBO from the GPU with data sent through uniforms, it then reads
that data from the CPU and compares the expected output.
nullptr was being returned in the error case, which, at a glance may
seem perfectly OK... until you realize that std::string has the
invariant that it may not be constructed from a null pointer. This
means that if this error case was ever hit, then the application would
most likely crash from a thrown exception in std::string's constructor.
Instead, we can change the function to return an optional value,
indicating if a failure occurred.
Makes the parameter ordering consistent, and also makes the filename
parameter a std::string. A std::string would be constructed anyways with
the previous code, as IOFile's only constructor with a filepath is one
taking a std::string.
We can also make WriteStringToFile's string parameter utilize a
std::string_view for the string, making use of our previous changes to
IOFile.
We don't need to force the usage of a std::string here, and can instead
use a std::string_view, which allows writing out other forms of strings
(e.g. C-style strings) without any unnecessary heap allocations.
This allows for forming comment nodes without making unnecessary copies
of the std::string instance.
e.g. previously:
Comment(fmt::format("Base address is c[0x{:x}][0x{:x}]",
cbuf->GetIndex(), cbuf_offset));
Would result in a copy of the string being created, as CommentNode()
takes a std::string by value (a const ref passed to a value parameter
results in a copy).
Now, only one instance of the string is ever moved around. (fmt::format
returns a std::string, and since it's returned from a function by value,
this is a prvalue (which can be treated like an rvalue), so it's moved
into Comment's string parameter), we then move it into the CommentNode
constructor, which then moves the string into its member variable).
Amends cases where we were using things that were indirectly being
satisfied through other headers. This way, if those headers change and
eliminate dependencies on other headers in the future, we don't have
cascading compilation errors.
Previously, the code was accumulating data into a std::vector and then
tossing all of it away if a setting was disabled.
Instead, we can just check if it's disabled and do no work at all if
possible. If it's enabled, then we can append to the vector and
allocate.
Unlikely to impact usage much, but it is slightly less sloppy with
resources.
A few of the aoc service stubs/implementations weren't fully popping all
of the parameters passed to them. This ensures that all parameters are
popped and, at minimum, logged out.
Given the array is a private static array, we can just make it
internally linked to hide it from external code. This also allows us to
remove an inclusion within the header.
SMDH is a metadata format used in some executable formats for the
Nintendo 3DS. Switch executables don't utilize this metadata format, so
this just a holdover from Citra and can be corrected.
Allows the loading screen code to compile with implicit string
conversions disabled.
While we're at it remove unnecessary const usages, and add it to nearby
variables where appropriate.
Gets rid of the need to special-case brace handling depending on the
overload used, and makes it consistent across the board with how fmt
handles them.
Strings with compile-time deducible strings are directly forwarded to
std::string's constructor, so we don't need to worry about the
performance difference here, as it'll be identical.
In a lot of places throughout the decompiler, string concatenation via
operator+ is used quite heavily. This is usually fine, when not heavily
used, but when used extensively, can be a problem. operator+ creates an
entirely new heap allocated temporary string and given we perform
expressions like:
std::string thing = a + b + c + d;
this ends up with a lot of unnecessary temporary strings being created
and discarded, which kind of thrashes the heap more than we need to.
Given we utilize fmt in some AddLine calls, we can make this a part of
the ShaderWriter's API. We can make an overload that simply acts as a
passthrough to fmt.
This way, whenever things need to be appended to a string, the operation
can be done via a single string formatting operation instead of
discarding numerous temporary strings. This also has the benefit of
making the strings themselves look nicer and makes it easier to spot
errors in them.
Many of these constructors don't even need to be templated. The only
ones that need to be templated are the ones that actually make use of
the parameter pack.
Even then, since std::vector accepts an initializer list, we can supply
the parameter pack directly to it instead of creating our own copy of
the list, then copying it again into the std::vector.
Given the class contains quite a lot of non-trivial types, place the
constructor and destructor within the cpp file to avoid inlining
construction and destruction code everywhere the class is used.
Avoids performing copies into the pair being returned. Instead, we can
just move the resources into the pair, avoiding the need to make copies
of both the std::string and ShaderEntries struct.
Given the offset is assigned a fixed value in the constructor, we can
just assign it directly and get rid of the need to write the name of the
variable again in the constructor initializer list.
Given the disk shader cache contains non-trivial types, we should
default it in the cpp file in order to prevent inlining of the
complex destruction logic.
The standard library expects hash specializations that don't throw
exceptions. Make this explicit in the type to allow selection of better
code paths if possible in implementations.
We don't need to load the code into a vector and then construct a string
over the data. We can just create a string with the necessary size ahead
of time, and read the data directly into it, getting rid of an
unnecessary heap allocation.
std::move does nothing when applied to a const variable. Resources can't
be moved if the object is immutable. With this change, we don't end up
making several unnecessary heap allocations and copies.
Booleans don't have a guaranteed size, but we still want to have them
integrate into the disk cache system without needing to actually use a
different type. We can do this by supplying non-template overloads for
the bool type.
Non-template overloads always have precedence during function
resolution, so this is safe to provide.
This gets rid of the need to smatter ternary conditionals, as well as
the need to use u8 types to store the value in.
These are only used from within this translation unit, so they don't
need to have external linkage. They were intended to be marked with this
anyways to be consistent with the other service functions.
Renames the members to more accurately indicate what they signify.
"OneShot" and "Sticky" are kind of ambiguous identifiers for the reset
types, and can be kind of misleading. Automatic and Manual communicate
the kind of reset type in a clearer manner. Either the event is
automatically reset, or it isn't and must be manually cleared.
The "OneShot" and "Sticky" terminology is just a hold-over from Citra
where the kernel had a third type of event reset type known as "Pulse".
Given the Switch kernel only has two forms of event reset types, we
don't need to keep the old terminology around anymore.
This reduces the boilerplate that services have to write out the current thread explicitly. Using current thread instead of client thread is also semantically incorrect, and will be a problem when we implement multicore (at which time there will be multiple current threads)
Nvidia's proprietary driver creates a real OpenGL compatibility profile
without this option, meanwhile Intel (and probably AMD, I haven't tested
it) require that QSurfaceFormat::FormatOption::DeprecatedFunctions is
explicitly enabled.
This was reduced due to happening on most games and at such constant
rate that it affected performance heavily for the end user. In general,
we are well aware of the assert and an implementation is already
planned.
Avoids inlining destruction logic where applicable, and also makes
forward declarations not cause unexpected compilation errors depending
on where the State class is used.
Lessens the amount of code that needs to be read, and gets rid of the
need to introduce an indexing variable. Instead, we just operate on the
objects directly.
std::memset is used to clear the entire register structure, which
requires that the Regs struct be trivially copyable (otherwise undefined
behavior is invoked). This prevents the case where a non-trivial type is
potentially added to the struct.
std::move within a copy constructor (on a data member that isn't
mutable) will always result in a copy. Because of that, the behavior of
this copy constructor is identical to the one that would be generated
automatically by the compiler, so we can remove it.
This corrects cases where it was possible to write more entries into the
write buffer than were requested. Now, we check the size of the buffer
before actually writing into them.
We were also returning the wrong value for
GetAvailableLanguageCodeCount2(). This was previously returning 64, but
only 17 should have been returned. 64 entries is the size of the static
array used in MakeLanguageCode() within the service binary itself, but
isn't the actual total number of language codes present.
Makes the class less surprising when it comes to forward declaring the
type, and also prevents inlining the destruction code of the class,
given it contains non-trivial types.
These are able to be omitted from the declaration of functions, since
they don't do anything at the type system level. The definitions of the
functions can retain the use of const though, since they make the
variables immutable in the implementation of the function where they're
used.
Instead of retrieving the data from the std::variant instance, we can
just check if the variant contains that type of data.
This is essentially the same behavior, only it returns a bool indicating
whether or not the type in the variant is currently active, instead of
actually retrieving the data.
By default, MSVC doesn't use standards-compliant volatile semantics.
This makes it behave in a standards-compliant manner, making
expectations more uniform across compilers.
For similar reasons to the previous change, we move this to a single
function, so we don't need to duplicate the conversion logic in several
places within main.cpp.
Specifies the conversions explicitly to avoid implicit conversions from
const char* to QString. This makes it easier to disable implicit QString
conversions in the future.
In this case, the implicit conversion was technically wrong as well. The
implicit conversion treats the input strings as ASCII characters. This
would result in an incorrect conversion being performed in the rare case
a branch name was created with a non-ASCII Unicode character, likely
resulting in junk being displayed.
Over time our config values have grown quite numerous in size.
Unfortunately it also makes the single functions we have for loading and
saving values more error prone.
For example, we were loading the core settings twice when they only
should have been loaded once. In another section, a variable was
shadowing another variable used to load settings from a completely
different section.
Finally, in one other case, there was an extraneous endGroup() call used
that didn't need to be done. This was essentially dead code and also a
bug waiting to happen.
This separates the section loading code into its own separate functions.
This keeps variables only visible to the code that actually needs it,
and makes it much easier to visually see the end of each individual
configuration group. It also makes it much easier to visually catch bugs
during code review.
While we're at it, this also uses QStringLiteral instead of raw string
literals, which both avoids constructing a lot of QString instances, but
also makes it much easier to disable implicit ASCII to QString and
vice-versa in the future via setting QT_NO_CAST_FROM_ASCII and
QT_NO_CAST_TO_ASCII as compilation flags.
The C++ standard allows constexpr variables declared with the extern
keyword to have external linkage. Previously MSVC wasn't abiding by
this. This just makes the compiler more standards compliant during
builds.
Given we currently don't make use of anything that would break by this,
this is safe to enable.
The backend is not used until we decide to submit the testcase/telemetry, and creating it early prevents users from updating the credentials properly while the games are running.
Also introduced in REV5 was a variable-size audio command buffer. This
also affects how the size of the work buffer should be determined, so we
can add handling for this as well.
Thankfully, no other alterations were made to how the work buffer size
is calculated in 7.0.0-8.0.0. There were indeed changes made to to how
some of the actual audio commands are generated though (particularly in
REV7), however they don't apply here.
Introduced in REV5. This is trivial to add support for, now that
everything isn't a mess of random magic constant values.
All this is, is a change in data type sizes as far as this function
cares.
"Unmagics" quite a few magic constants within this code, making it much
easier to understand. Particularly given this factors out specific
sections into their own self-contained lambda functions.
Instead of asserting on already stored shader variants, silently skip them.
This shouldn't be happening but when a shader is invalidated and it is
not stored in the shader cache, this assert would hit and save that
shader anyways when the asserts are disabled.
These are actually quite important indicators of thread lifetimes, so
they should be going into the debug log, rather than being treated as
misc info and delegated to the trace log.
Makes the code much nicer to follow in terms of behavior and control
flow. It also fixes a few bugs in the implementation.
Notably, the thread's owner process shouldn't be accessed in order to
retrieve the core mask or ideal core. This should be done through the
current running process. The only reason this bug wasn't encountered yet
is because we currently only support running one process, and thus every
owner process will be the current process.
We also weren't checking against the process' CPU core mask to see if an
allowed core is specified or not.
With this out of the way, it'll be less noisy to implement proper
handling of the affinity flags internally within the kernel thread
instances.
Provides serialization/deserialization to the database in system save files, accessors for database state and proper handling of both major Mii formats (MiiInfo and MiiStoreData)
This option allows picking the compatibility profile since a lot of bugs
are fixed in it. We devs will use this option to easierly debug current
problems in our Core implementation.:wq
flushing is now responsability of children caches instead of the cache
object. This change will allow the specific cache to pass extra
parameters on flushing and will allow more flexibility.
This is a holdover from Citra, where the 3DS has both
WaitSynchronization1 and WaitSynchronizationN. The switch only has one
form of wait synchronizing (literally WaitSynchonization). This allows
us to throw out code that doesn't apply at all to the Switch kernel.
Because of this unnecessary dichotomy within the wait synchronization
utilities, we were also neglecting to properly handle waiting on
multiple objects.
While we're at it, we can also scrub out any lingering references to
WaitSynchronization1/WaitSynchronizationN in comments, and change them
to WaitSynchronization (or remove them if the mention no longer
applies).
The actual behavior of this function is slightly more complex than what
we're currently doing within the supervisor call. To avoid dumping most
of this behavior in the supervisor call itself, we can migrate this to
another function.
The default constructor will always run, even when not specified, so
this is redundant.
However, the context member can indeed be initialized in the constructor
initializer list.
Previously we were building with MBCS, which is pretty undesirable. We
want the application to be Unicode-aware in general.
Currently, we make the command line variant of yuzu use ANSI variants of
the non-standard getopt functions that we link in for Windows, given we
only have an ANSI option-set.
We should really replace getopt with a library that we make all build
types of yuzu link in, but this will have to do for the time being.
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
This is a compile definition introduced in Qt 4.8 for reducing the total
potential number of strings created when performing string
concatenation. This allows for less memory churn.
This can be read about here:
https://blog.qt.io/blog/2011/06/13/string-concatenation-with-qstringbuilder/
For a change that isn't source-compatible, we only had one occurrence
that actually need to have its type clarified, which is pretty good, as
far as transitioning goes.
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.
What this entails, is:
- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:
Init() {
main_process = Process::Create(...);
kernel.MakeCurrentProcess(main_process.get());
}
Load() {
const auto load_result = loader.Load(*kernel.GetCurrentProcess());
if (load_result != Loader::ResultStatus::Success) {
// Handle error here.
}
...
}
which presented a problem.
Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.
Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.
We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
Now that we have dependencies on the initialization order, we can move
the creation of the main process to a more sensible area: where we
actually load in the executable data.
This allows localizing the creation and loading of the process in one
location, making the initialization of the process much nicer to trace.
Like with CPU emulation, we generally don't want to fire off the threads
immediately after the relevant classes are initialized, we want to do
this after all necessary data is done loading first.
This splits the thread creation into its own interface member function
to allow controlling when these threads in particular get created.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.
Allows the compiler to inform when the result of a swap function is
being ignored (which is 100% a bug in all usage scenarios). We also mark
them noexcept to allow other functions using them to be able to be
marked as noexcept and play nicely with things that potentially inspect
"nothrowability".
Including every OS' own built-in byte swapping functions is kind of
undesirable, since it adds yet another build path to ensure compilation
succeeds on.
Given we only support clang, GCC, and MSVC for the time being, we can
utilize their built-in functions directly instead of going through the
OS's API functions.
This shrinks the overall code down to just
if (msvc)
use msvc's functions
else if (clang or gcc)
use clang/gcc's builtins
else
use the slow path
The template type here is actually a forwarding reference, not an rvalue
reference in this case, so it's more appropriate to use std::forward to
preserve the value category of the type being moved.
Some objects declare their handle type as const, while others declare it
as constexpr. This makes the const ones constexpr for consistency, and
prevent unexpected compilation errors if these happen to be attempted to be
used within a constexpr context.
These indicate options that alter how a read/write is performed.
Currently we don't need to handle these, as the only one that seems to
be used is for writes, but all the custom options ever seem to do is
immediate flushing, which we already do by default.
Without passing in a parent, this can result in focus being stolen from
the dialog in certain cases.
Example:
On Windows, if the logging window is left open, the logging Window will
potentially get focus over the hotkey dialog itself, since it brings all
open windows for the application into view. By specifying a parent, we
only bring windows for the parent into view (of which there are none,
aside from the hotkey dialog).
Avoids dumping all of the core settings machinery into whatever files
include this header. Nothing inside the header itself actually made use
of anything in settings.h anyways.
We need to ensure dynarmic gets a valid pointer if the page table is
resized (the relevant pointers would be invalidated in this scenario).
In this scenario, the page table can be resized depending on what kind
of address space is specified within the NPDM metadata (if it's
present).
In our error console, when loading a game, the strings:
QString::arg: Argument missing: "Loading...", 0
QString::arg: Argument missing: "Launching...", 0
would occasionally pop up when the loading screen was running. This was
due to the strings being assumed to have formatting indicators in them,
however only two out of the four strings actually have them.
This only applies the arguments to the strings that have formatting
specifiers provided, which avoids these warnings from occurring.
Adjusts the interface of the wrappers to take a system reference, which
allows accessing a system instance without using the global accessors.
This also allows getting rid of all global accessors within the
supervisor call handling code. While this does make the wrappers
themselves slightly more noisy, this will be further cleaned up in a
follow-up. This eliminates the global system accessors in the current
code while preserving the existing interface.
Keeps the return type consistent with the function name. While we're at
it, we can also reduce the amount of boilerplate involved with handling
these by using structured bindings.
This doesn't actually work anymore, and given how long it's been left in
that state, it's unlikely anyone actually seriously used it.
Generally it's preferable to use RenderDoc or Nsight to view surfaces.
Given we already ensure nothing can set the zeroth register in
SetRegister(), we don't need to check if the index is zero and special
case it. We can just access the register normally, since it's already
going to be zero.
We can also replace the assertion with .at() to perform the equivalent
behavior inline as part of the API.
Now, since we have a const qualified variant of GetPointer(), we can put
it to use in ReadBlock() to retrieve the source pointer that is passed
into memcpy.
Now block reading may be done from a const context.
- Use QStringLiteral where applicable.
- Use const where applicable
- Remove unnecessary precondition check (we already assert the pixbuf
being non null)
Fills in the missing surface types that were marked as unknown. The
order corresponds with the TextureFormat enum within
video_core/texture.h.
We also don't need to all of these strings as translatable (only the
first string, as it's an English word).
Since c5d41fd812 callback parameters were
changed to use an s64 to represent late cycles instead of an int, so
this was causing a truncation warning to occur here. Changing it to s64
is sufficient to silence the warning.
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
BitField has been trivially copyable since
e99a148628, so we can eliminate these
TODO comments and use ReadObject() directly instead of memcpying the
data.
Makes the return type consistently uniform (like the intrinsics we're
wrapping). This also conveniently silences a truncation warning within
the kernel multi_level_queue.
Rather than make a full copy of the path, we can just use a string view
and truncate the viewed portion of the string instead of creating a totally
new truncated string.
Temporal generally indicates a relation to time, but this is just
creating a temporary, so this isn't really an accurate name for what the
function is actually doing.
TXQ returns integer types. Shaders usually do:
R0 = TXQ(); // => int
R0 = static_cast<float>(R0);
If we don't treat it as an integer, it will cast a binary float value as
float - resulting in a corrupted number.
In several places, we have request parsers where there's nothing to
really parse, simply because the HLE function in question operates on
buffers. In these cases we can just remove these instances altogether.
In the other cases, we can retrieve the relevant members from the parser
and at least log them out, giving them some use.
Applies the override specifier where applicable. In the case of
destructors that are defaulted in their definition, they can
simply be removed.
This also removes the unnecessary inclusions being done in audin_u and
audrec_u, given their close proximity.
Quite a few unused includes have built up over time, particularly on
core/memory.h. Removing these includes means the source files including
those files will no longer need to be rebuilt if they're changed, making
compilation slightly faster in this scenario.
Rather than scream that the file doesn't exist, we can clearly state
what specifically doesn't exist, to avoid ambiguity, and make it easier
to understand for non-primary English speakers/readers.
Quite a bit of these were out of sync with Switchbrew (and in some cases
entirely wrong). While we're at it, also expand the section of named
members. A segment within the control metadata is used to specify
maximum values for the user, device, and cache storage max sizes and
journal sizes.
These appear to be generally used by the am service (e.g. in
CreateCacheStorage, etc).
We need to be checking whether or not the given address is within the
kernel address space or if the given address isn't word-aligned and bail
in these scenarios instead of trashing any kernel state.
For whatever reason, shared memory was being used here instead of
transfer memory, which (quite clearly) will not work based off the name
of the function.
This corrects this wonky usage of shared memory.
Given server sessions can be given a name, we should allow retrieving
it instead of using the default implementation of GetName(), which would
just return "[UNKNOWN KERNEL OBJECT]".
The AddressArbiter type isn't actually used, given the arbiter itself
isn't a direct kernel object (or object that implements the wait object
facilities).
Given this, we can remove the enum entry entirely.
Moves includes into the cpp file where necessary. This way,
microprofile-related stuff isn't dumped into other UI-related code when
the dialog header gets included.
Similarly like svcGetProcessList, this retrieves the list of threads
from the current process. In the kernel itself, a process instance
maintains a list of threads, which are used within this function.
Threads are registered to a process' thread list at thread
initialization, and unregistered from the list upon thread destruction
(if said thread has a non-null owning process).
We assert on the debug event case, as we currently don't implement
kernel debug objects.
Now that ShouldWait() is a const qualified member function, this one can
be made const qualified as well, since it can handle passing a const
qualified this pointer to ShouldWait().
Previously this was performing a u64 + int sign conversion. When dealing
with addresses, we should generally be keeping the arithmetic in the
same signedness type.
This also gets rid of the static lifetime of the constant, as there's no
need to make a trivial type like this potentially live for the entire
duration of the program.
This doesn't really provide any benefit to the resource limit interface.
There's no way for callers to any of the service functions for resource
limits to provide a custom name, so all created instances of resource
limits other than the system resource limit would have a name of
"Unknown".
The system resource limit itself is already trivially identifiable from
its limit values, so there's no real need to take up space in the object to
identify one object meaningfully out of N total objects.
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Since C++17, we no longer need to explicitly specify the type of the
mutex within the lock_guard. The type system can now deduce these with
deduction guides.
Based off RE, most of these structure members are register values, which
makes, sense given this service is used to convey fatal errors.
One member indicates the program entry point address, one is a set of
bit flags used to determine which registers to print, and one member
indicates the architecture type.
The only member that still isn't determined is the final member within
the data structure.
The kernel makes sure that the given size to unmap is always the same
size as the entire region managed by the shared memory instance,
otherwise it returns an error code signifying an invalid size.
This is similarly done for transfer memory (which we already check for).
Many of these functions are carried over from Dolphin (where they aren't
used anymore). Given these have no use (and we really shouldn't be
screwing around with OS-specific thread scheduler handling from the
emulator, these can be removed.
The function for setting the thread name is left, however, since it can
have debugging utility usages.
This was initially added to prevent problems from stubbed/not implemented NFC services, but as we never encountered such and as it's only used in a deprecated function anyway, I guess we can just remove it to prevent more clutter of the settings.
Reports the (mostly) correct size through svcGetInfo now for queries to
total used physical memory. This still doesn't correctly handle memory
allocated via svcMapPhysicalMemory, however, we don't currently handle
that case anyways.
This will make operating with the process-related SVC commands much
nicer in the future (the parameter representing the stack size in
svcStartProcess is a 64-bit value).
This isn't used at all in the OpenGL shader cache, so we can remove it's
include here, meaning one less file needs to be recompiled if any
changes ever occur within that header.
core/memory.h is also not used within this file at all, so we can remove
it as well.
We can just pass in the Maxwell3D instance instead of going through the
system class to get at it.
This also lets us simplify the interface a little bit. Since we pass in
the Maxwell3D context now, we only really need to pass the shader stage
index value in.
The pusher instance is only ever used in the constructor of the
ThreadManager for creating the thread that the ThreadManager instance
contains. Aside from that, the member is unused, so it can be removed.
These functions act in tandem similar to how a lock or mutex require a
balanced lock()/unlock() sequence.
EnterFatalSection simply increments a counter for how many times it has
been called, while LeaveFatalSection ensures that a previous call to
EnterFatalSection has occured. If a previous call has occurred (the
counter is not zero), then the counter gets decremented as one would
expect. If a previous call has not occurred (the counter is zero), then
an error code is returned.
In some cases, our callbacks were using s64 as a parameter, and in other
cases, they were using an int, which is inconsistent.
To make all callbacks consistent, we can just use an s64 as the type for
late cycles, given it gets rid of the need to cast internally.
While we're at it, also resolve some signed/unsigned conversions that
were occurring related to the callback registration.
One behavior that we weren't handling properly in our heap allocation
process was the ability for the heap to be shrunk down in size if a
larger size was previously requested.
This adds the basic behavior to do so and also gets rid of HeapFree, as
it's no longer necessary now that we have allocations and deallocations
going through the same API function.
While we're at it, fully document the behavior that this function
performs.
Makes it more obvious that this function is intending to stand in for
the actual supervisor call itself, and not acting as a general heap
allocation function.
Also the following change will merge the freeing behavior of HeapFree
into this function, so leaving it as HeapAllocate would be misleading.
In cases where HeapAllocate is called with the same size of the current
heap, we can simply do nothing and return successfully.
This avoids doing work where we otherwise don't have to. This is also
what the kernel itself does in this scenario.
Another holdover from citra that can be tossed out is the notion of the
heap needing to be allocated in different addresses. On the switch, the
base address of the heap will always be managed by the memory allocator
in the kernel, so this doesn't need to be specified in the function's
interface itself.
The heap on the switch is always allocated with read/write permissions,
so we don't need to add specifying the memory permissions as part of the
heap allocation itself either.
This also corrects the error code returned from within the function.
If the size of the heap is larger than the entire heap region, then the
kernel will report an out of memory condition.
The use of a shared_ptr is an implementation detail of the VMManager
itself when mapping memory. Because of that, we shouldn't require all
users of the CodeSet to have to allocate the shared_ptr ahead of time.
It's intended that CodeSet simply pass in the required direct data, and
that the memory manager takes care of it from that point on.
This means we just do the shared pointer allocation in a single place,
when loading modules, as opposed to in each loader.
This source file was utilizing its own version of the NSO header.
Instead of keeping this around, we can have the patch manager also use
the version of the header that we have defined in loader/nso.h
The total struct itself is 0x100 (256) bytes in size, so we should be
providing that amount of data.
Without the data, this can result in omitted data from the final loaded
NSO file.
Implements an API agnostic texture view based texture cache. Classes
defined here are intended to be inherited by the API implementation and
used in API-specific code.
This implementation exposes protected virtual functions to be called
from the implementer.
Before executing any surface copies methods (defined in API-specific code)
it tries to detect if the overlapping surface is a superset and if it
is, it creates a view. Views are references of a subset of a surface, it
can be a superset view (the same as referencing the whole texture).
Current code manages 1D, 1D array, 2D, 2D array, cube maps and cube map
arrays with layer and mipmap level views. Texture 3D slices views are
not implemented.
If the view attempt fails, the fast path is invoked with the overlapping
textures (defined in the implementer). If that one fails (returning
nullptr) it will flush and reload the texture.
Makes it more evident that one is for actual code and one is for actual
data. Mutable and static are less than ideal terms here, because
read-only data is technically not mutable, but we were mapping it with
that label.
In 93da8e0abf, the page table construct
was moved to the common library (which utilized these inclusions). Since
the move, nothing requires these headers to be included within the
memory header.
- GPU will be released on shutdown, before pages are unmapped.
- On subsequent runs, current_page_table will be not nullptr, but GPU might not be valid yet.
When #2247 was created, thread_queue_list.h was the only user of
boost-related code, however #2252 moved the page table struct into
common, which makes use of Boost.ICL, so we need to add the dependency
to the common library's link interface again.
Given this is utilized by the loaders, this allows avoiding inclusion of
the kernel process definitions where avoidable.
This also keeps the loading format for all executable data separate from
the kernel objects.
Neither the NRO or NSO loaders actually make use of the functions or
members provided by the Linker interface, so we can just remove the
inheritance altogether.
This function passes in the desired main applet and library applet
volume levels. We can then just pass those values back within the
relevant volume getter functions, allowing us to unstub those as well.
The initial values for the library and main applet volumes differ. The
main applet volume is 0.25 by default, while the library applet volume
is initialized to 1.0 by default in the services themselves.
Modifying CMAKE_* related flags directly applies those changes to every
single CMake target. This includes even the targets we have in the
externals directory.
So, if we ever increased our warning levels, or enabled particular ones,
or enabled any other compilation setting, then this would apply to
externals as well, which is often not desirable.
This makes our compilation flag setup less error prone by only applying
our settings to our targets and leaving the externals alone entirely.
This also means we don't end up clobbering any provided flags on the
command line either, allowing users to specifically use the flags they
want.
We generally shouldn't be hijacking CMAKE_CXX_FLAGS, etc as a means to
append flags to the targets, since this adds the compilation flags to
everything, including our externals, which can result in weird issues
and makes the build hierarchy fragile.
Instead, we want to just apply these compilation flags to our targets,
and let those managing external libraries to properly specify their
compilation flags.
This also results in us not getting as many warnings, as we don't raise
the warning level on every external target.
We really don't need to pull in several headers of boost related
machinery just to perform the erase-remove idiom (particularly with
C++20 around the corner, which adds universal container std::erase and
std::erase_if, which we can just use instead).
With this, we don't need to link in anything boost-related into common.
Rather than make a global accessor for this sort of thing. We can make
it a part of the thread interface itself. This allows getting rid of a
hidden global accessor in the kernel code.
This condition was checking against the nominal thread priority, whereas
the kernel itself checks against the current priority instead. We were
also assigning the nominal priority, when we should be assigning
current_priority, which takes priority inheritance into account.
This can lead to the incorrect priority being assigned to a thread.
Given we recursively update the relevant threads, we don't need to go
through the whole mutex waiter list. This matches what the kernel does
as well (only accessing the first entry within the waiting list).
* gdbstub: fix IsMemoryBreak() returning false while connected to client
As a result, the only existing codepath for a memory watchpoint hit to break into GDB (InterpeterMainLoop, GDB_BP_CHECK, ARMul_State::RecordBreak) is finally taken,
which exposes incorrect logic* in both RecordBreak and ServeBreak.
* a blank BreakpointAddress structure is passed, which sets r15 (PC) to NULL
* gdbstub: DynCom: default-initialize two members/vars used in conditionals
* gdbstub: DynCom: don't record memory watchpoint hits via RecordBreak()
For now, instead check for GDBStub::IsMemoryBreak() in InterpreterMainLoop and ServeBreak.
Fixes PC being set to a stale/unhit breakpoint address (often zero) when a memory watchpoint (rwatch, watch, awatch) is handled in ServeBreak() and generates a GDB trap.
Reasons for removing a call to RecordBreak() for memory watchpoints:
* The``breakpoint_data`` we pass is typed Execute or None. It describes the predicted next code breakpoint hit relative to PC;
* GDBStub::IsMemoryBreak() returns true if a recent Read/Write operation hit a watchpoint. It doesn't specify which in return, nor does it trace it anywhere. Thus, the only data we could give RecordBreak() is a placeholder BreakpointAddress at offset NULL and type Access. I found the idea silly, compared to simply relying on GDBStub::IsMemoryBreak().
There is currently no measure in the code that remembers the addresses (and types) of any watchpoints that were hit by an instruction, in order to send them to GDB as "extended stop information."
I'm considering an implementation for this.
* gdbstub: Change an ASSERT to DEBUG_ASSERT
I have never seen the (Reg[15] == last_bkpt.address) assert fail in practice, even after several weeks of (locally) developping various branches around GDB. Only leave it inside Debug builds.
Makes it an instantiable class like it is in the actual kernel. This
will also allow removing reliance on global accessors in a following
change, now that we can encapsulate a reference to the system instance
in the class.
Within the kernel, shared memory and transfer memory facilities exist as
completely different kernel objects. They also have different validity
checking as well. Therefore, we shouldn't be treating the two as the
same kind of memory.
They also differ in terms of their behavioral aspect as well. Shared
memory is intended for sharing memory between processes, while transfer
memory is intended to be for transferring memory to other processes.
This breaks out the handling for transfer memory into its own class and
treats it as its own kernel object. This is also important when we
consider resource limits as well. Particularly because transfer memory
is limited by the resource limit value set for it.
While we currently don't handle resource limit testing against objects
yet (but we do allow setting them), this will make implementing that
behavior much easier in the future, as we don't need to distinguish
between shared memory and transfer memory allocations in the same place.
The previous code had some minor issues with it, really not a big deal,
but amending it is basically 'free', so I figured, "why not?".
With the standard container maps, when:
map[key] = thing;
is done, this can cause potentially undesirable behavior in certain
scenarios. In particular, if there's no value associated with the key,
then the map constructs a default initialized instance of the value
type.
In this case, since it's a std::shared_ptr (as a type alias) that is
the value type, this will construct a std::shared_pointer, and then
assign over it (with objects that are quite large, or actively heap
allocate this can be extremely undesirable).
We also make the function take the region by value, as we can avoid a
copy (and by extension with std::shared_ptr, a copy causes an atomic
reference count increment), in certain scenarios when ownership isn't a
concern (i.e. when ReserveGlobalRegion is called with an rvalue
reference, then no copy at all occurs). So, it's more-or-less a "free"
gain without many downsides.
With this, all kernel objects finally have all of their data members
behind an interface, making it nicer to reason about interactions with
other code (as external code no longer has the freedom to totally alter
internals and potentially messing up invariants).
After doing a little more reading up on the Opus codec, it turns out
that the multistream API that is part of libopus can handle regular
packets. Regular packets are just a degenerate case of multistream Opus
packets, and all that's necessary is to pass the number of streams as 1
and provide a basic channel mapping, then everything works fine for
that case.
This allows us to get rid of the need to use both APIs in the future
when implementing multistream variants in a follow-up PR, greatly
simplifying the code that needs to be written.
Previously this was required, as BitField wasn't trivially copyable.
BitField has since been made trivially copyable, so now this isn't
required anymore.
Relocates the error code to where it's most related, similar to how all
the other error codes are. Previously we were including a non-generic
error in the main result code header.
These can just be passed regularly, now that we use fmt instead of our
old logging system.
While we're at it, make the parameters to MakeFunctionString
std::string_views.
Instead of holding a reference that will get invalidated by
dma_pushbuffer.pop(), hold it as a copy. This doesn't have any
performance cost since CommandListHeader is 8 bytes long.
There's no real need to use a shared lifetime here, since we don't
actually expose them to anything else. This is also kind of an
unnecessary use of the heap given the objects themselves are so small;
small enough, in fact that changing over to optionals actually reduces
the overall size of the HLERequestContext struct (818 bytes to 808
bytes).
Now that we have the address arbiter extracted to its own class, we can
fix an innaccuracy with the kernel. Said inaccuracy being that there
isn't only one address arbiter. Each process instance contains its own
AddressArbiter instance in the actual kernel.
This fixes that and gets rid of another long-standing issue that could
arise when attempting to create more than one process.
Similar to how WaitForAddress was isolated to its own function, we can
also move the necessary conditional checking into the address arbiter
class itself, allowing us to hide the implementation details of it from
public use.
Rather than let the service call itself work out which function is the
proper one to call, we can make that a behavior of the arbiter itself,
so we don't need to directly expose those implementation details.
This makes the class much more flexible and doesn't make performing
copies with classes that contain a bitfield member a pain.
Given BitField instances are only intended to be used within unions, the
fact the full storage value would be copied isn't a big concern (only
sizeof(union_type) would be copied anyways).
While we're at it, provide defaulted move constructors for consistency.
Because of the recent separation of GPU functionality into sync/async
variants, we need to mark the destructor virtual to provide proper
destruction behavior, given we use the base class within the System
class.
Prior to this, it was undefined behavior whether or not the destructor
in the derived classes would ever execute.
This will be utilized by more than just that class in the future. This
also renames it from OpusHeader to OpusPacketHeader to be more specific
about what kind of header it is.