Memory API is inefficient at freeing. #34

Open
opened 2024-05-30 16:10:04 -04:00 by lda · 3 comments
Contributor

The Memory API is currently pretty inefficient at freeing data when there has already been a lot created(probably due to hash collisions, from my test for example, it generally took me about 15K "probing" checks to find the MemoryInfo of a pointer, with 20K allocations done already), which happens a lot when for example, decoding large JSON streams.

An idea to fix this would be to simply store the MemoryInfo alongside the returned pointer(with a fixed and known offset), so that any information could be simply retrieved on a free request. This however, would tend to be pretty dangerous(and couldn't really allow showing bad pointer problems).

Another would be to have a better hashing function for memory allocatiob, with less odds of having collisions(and therefore probe attempts).

The Memory API is currently pretty inefficient at freeing data when there has already been a lot created(probably due to hash collisions, from my test for example, it generally took me about 15K "probing" checks to find the MemoryInfo of a pointer, with 20K allocations done already), which happens a lot when for example, decoding large JSON streams. An idea to fix this would be to simply store the MemoryInfo alongside the returned pointer(with a fixed and known offset), so that any information could be simply retrieved on a free request. This however, would tend to be pretty dangerous(and couldn't really allow showing bad pointer problems). Another would be to have a better hashing function for memory allocatiob, with less odds of having collisions(and therefore probe attempts).
Owner

An idea to fix this would be to simply store the MemoryInfo alongside the returned pointer(with a fixed and known offset), so that any information could be simply retrieved on a free request. This however, would tend to be pretty dangerous(and couldn't really allow showing bad pointer problems).

To my knowledge, this is what pretty much every C allocator does. I was hoping to make Memory safer by not doing things this way, but since we're running into performance problems, I think maybe this is the best way to do things.

We can store a magic value at the beginning of MemoryInfo to make sure that we can identify our allocations, and we can also checksum the MemoryInfo to prevent accidental modification by other code.

We would have to work out how to ensure operations like getting a list of all allocations and whatnot still work, but I don't see that being too difficult with a linked list.

> An idea to fix this would be to simply store the MemoryInfo alongside the returned pointer(with a fixed and known offset), so that any information could be simply retrieved on a free request. This however, would tend to be pretty dangerous(and couldn't really allow showing bad pointer problems). To my knowledge, this is what pretty much every C allocator does. I was hoping to make `Memory` safer by *not* doing things this way, but since we're running into performance problems, I think maybe this is the best way to do things. We can store a magic value at the beginning of `MemoryInfo` to make sure that we can identify our allocations, and we can also checksum the `MemoryInfo` to prevent accidental modification by other code. We would have to work out how to ensure operations like getting a list of all allocations and whatnot still work, but I don't see that being too difficult with a linked list.
Author
Contributor

Also, it seems like calling MemoryIterate on every memory operation does also seem to grind them down(checking a large amount of allocations isn't free, after all), so we might also want to find a good solution to that too.
I've been trying to find methods to speed that up, and it seems like having the MemoryInfo alongside the actual data(with only a magic value for checking the validicity of it), while not calling MemoryIterate on every Malloc/Realloc/Free does make improvements(though my code for it is still currently very buggy).

Also, it seems like calling `MemoryIterate` on _every_ memory operation does also seem to grind them down(checking a large amount of allocations isn't free, after all), so we might also want to find a good solution to that too. I've been trying to find methods to speed that up, and it seems like having the `MemoryInfo` alongside the actual data(with only a magic value for checking the validicity of it), while not calling `MemoryIterate` on every Malloc/Realloc/Free does make improvements(though my code for it is still currently very buggy).
Author
Contributor

Oi, just wanted to point out I've pushed a modified version of my changes over at mem-moment. It still doesn't use MemoryIterate (I've yet to find a good way to make it work without causing a severe performance penalty), but from my large JSON parsing tests, it's very fast.

Oi, just wanted to point out I've pushed a modified version of my changes over at [`mem-moment`](https://git.telodendria.io/lda/Cytoplasm/src/branch/mem-moment). It still doesn't use `MemoryIterate` (I've yet to find a good way to make it work without causing a severe performance penalty), but from my large JSON parsing tests, it's _very_ fast.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Telodendria/Cytoplasm#34
No description provided.