Memory FastCopy / FastSet / FastCmp
LibXR::Memory provides a set of alignment- and burst-optimized memory primitives intended to replace generic memcpy / memset / memcmp on hot paths (such as ring buffer moves, IO TX/RX packing, etc.). The implementation selects a better 8/4/2/1-byte granularity based on pointer alignment and uses loop unrolling to improve throughput.
FastCopyonly supports non-overlapping copy semantics.
API
namespace LibXR {
class Memory {
public:
/**
* @brief Fast memory copy (non-overlapping semantics)
* @param dst Destination address
* @param src Source address
* @param size Number of bytes to copy
*/
static void FastCopy(void* dst, const void* src, size_t size);
/**
* @brief Fast memory fill (memset-like)
* @param dst Destination address
* @param value Fill value (repeated per byte)
* @param size Number of bytes to fill
*/
static void FastSet(void* dst, uint8_t value, size_t size);
/**
* @brief Fast memory compare (memcmp-like)
* @param a Address A
* @param b Address B
* @param size Number of bytes to compare
* @return 0 if equal; otherwise non-zero. The sign and difference semantics match memcmp (difference of the first mismatching byte).
*/
static int FastCmp(const void* a, const void* b, size_t size);
};
} // namespace LibXR
FastCopy Semantics
- If
dstandsrchave the same alignment phase (alignment offset), the implementation first handles the unaligned head bytes, then switches to burst copies usingLIBXR_ALIGN_SIZE(typically 8 or 4) with 8x unrolling. - If their alignment phases differ, it tries to fall back to the largest possible width based on address delta:
- When
LIBXR_ALIGN_SIZE == 8and the address delta is a multiple of 4, it can fall back to 4-byte burst copies. - When the address delta is even, it can fall back to 2-byte burst copies.
- Otherwise it falls back to byte-by-byte copying.
- When
- The tail that does not fill a full "wide copy" unit is completed byte-by-byte.
FastSet Semantics
- Returns immediately if
size == 0. - Writes head bytes until aligned, then performs bulk stores using a
LIBXR_ALIGN_SIZE-wide pattern (8 or 4) with 8x unrolling, and finally writes the tail bytes.
FastCmp Semantics
- Equivalent to
memcmp(a, b, size): returns 0 if equal; otherwise non-zero. - Returns 0 when
size == 0ora == b. - If alignment conditions allow, it compares using
LIBXR_ALIGN_SIZE-wide loads (8 or 4) with 8x unrolling. When a wide word differs, it falls back to byte-by-byte comparison within that word to produce the same return value semantics asmemcmp. - Falls back to byte-by-byte comparison when alignment conditions do not allow the wide path.
Usage Example
#include "libxr_def.hpp"
uint8_t src[256];
uint8_t dst[256];
// ... fill src ...
// Non-overlapping fast copy
LibXR::Memory::FastCopy(dst, src, sizeof(src));
// Fast fill
LibXR::Memory::FastSet(dst, 0x00, sizeof(dst));
// Fast compare
int diff = LibXR::Memory::FastCmp(dst, src, sizeof(src));
if (diff == 0) {
// equal
} else if (diff < 0) {
// dst < src
} else {
// dst > src
}