In the test-runner, defines are parameterized constants (limited
to integers) that are generated from the test suite tomls resulting
in many permutations of each test.
In order to make this efficient, these defines are implemented as
multi-layered lookup tables, using per-layer/per-scope indirect
mappings. This lets the test-runner and test suites define their
own defines with compile-time indexes independently. It also makes
building of the lookup tables very efficient, since they can be
incrementally populated as we expand the test permutations.
The four current define layers and when we need to build them:
layer defines predefine_map define_map
user-provided overrides per-run per-run per-suite
per-permutation defines per-perm per-case per-perm
per-geometry defines per-perm compile-time -
default defines compile-time compile-time -
Zephyr has prefixed all of its includes with <zephyr/...>. While the old
mode can still be used (CONFIG_LEGACY_INCLUDE_PATH) and is still enabled
by default, it's better to be prepared for its removal in the future.
Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
- Added filtering based on suite, case, perm, type, geometry
- Added --skip, --count, and --every (will be used for parallelism)
- Implemented --list-defines
- Better helptext for flags with arguments
- Other minor tweaks
- Indirect index map instead of bitmap+sparse array
- test_define_t and test_type_t
- Added back conditional filtering
- Added suite-level defines and filtering
This moves defines entirely into the runtime of the test_runner,
simplifying thing and reducing the amount of generated code that needs
to be build, at the cost of limiting test-defines to uintmax_t types.
This is implemented using a set of index-based scopes (created by
test.py) that allow different layers to override defines from other
layers, accessible through the global `test_define` function.
layers:
1. command-line overrides
2. per-case defines
3. per-geometry defines
This is to try a different design for testing, the goals are to make the
test infrastructure a bit simpler, with clear stages for building and
running, and faster, by avoiding rebuilding lfs.c n-times.
This is possible thanks to invoxiaamo's optimization of compacting
renames to avoid the O(n^3) nested filters. Not only does this
significantly reduce the runtime cost of that operation, but it
reduces the maximum possible depth of recursion to 3 frames.
Deepest lfs_dir_traverse before:
traverse with commit
'-> traverse with filter
'-> traverse with move
'-> traverse with filter
Deepest lfs_dir_traverse after:
traverse with commit
'-> traverse with move
'-> traverse with filter
There are two issues, when using the file-based block device emulation
on Windows Platforms:
1. There is no fsync implementation available. This needs to be mapped
to a Windows-specific FlushFileBuffers system call.
2. The block device file needs to be opened as binary file (O_BINARY)
The corresponding flag is not required for Linux.
Rename can be VERY time consuming. One of the reasons is the 4 recursion
level depth of lfs_dir_traverse() seen if a compaction happened during the
rename.
lfs_dir_compact()
size computation
[1] lfs_dir_traverse(cb=lfs_dir_commit_size)
- do 'duplicates and tag update'
[2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
- Reaching a LFS_FROM_MOVE tag (here)
[3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
- do 'duplicates and tag update'
[4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])
followed by the compaction itself:
[1] lfs_dir_traverse(cb=lfs_dir_commit_commit)
- do 'duplicates and tag update'
[2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
- Reaching a LFS_FROM_MOVE tag (here)
[3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
- do 'duplicates and tag update'
[4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])
Yet, analyse shows that levels [3] and [4] don't perform anything
if the callback is lfs_dir_traverse_filter...
A practical example:
- format and mount a 4KB block FS
- create 100 files of 256 Bytes named "/dummy_%d"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
This triggers a compaction where lfs_dir_traverse was called 148393 times,
generating 25e6+ lfs_bd_read calls (~100 MB+ of data)
With the optimization, lfs_dir_traverse is now called 3248 times
(589e3 lfs_bds_calls (~2.3MB of data)
=> x 43 improvement...
- nit: Moving brace to end of if statement line for consistency
- mount: add more debug info per CR
- Fix compiler error from extra parentheses
- Fix superblock typo
The basic idea is simple, if we seek to a position in the currently
loaded cache, don't flush the cache. Notably this ensures that seek is
always as fast or faster than just reading the data.
This is a bit tricky since we need to check that our new block and
offset match the cache, fortunately we can skip the block check by
reevaluating the block index for both the current and new positions.
Note this only works whene reading, for writing we need to always flush
the cache, or else we will lose the pending write data.
Interestingly this was introduced by two different PRs which were not tested
together until pre-release testing:
- Fix lfs_file_seek doesn't update cache properties correctly
- Fix compiler warnings when LFS_READONLY defined
BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0.
When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected.
Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough). Still as expected now.
BUG happens when we try to alloc a new block pair for the second split:
As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!! The result is block 4,5 are allocated again(This is where things are getting wrong).
commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count.
In summary, this problem exists after 2.1.3.
Solution: don't call lfs_alloc_ack in lfs_dir_split.
lfs_dir_traverse is a bit unpleasant in that it is inherently a
recursive function, but without a strict bound of 4 calls (commit -> filter ->
move -> filter), and efforts to unroll the recursion comes at a
signification code cost.
It turns out the best solution I've found so far is to simple create an
explicit stack with an explicit bound of 4 calls (or more accurately,
3 pushed frames).
---
This actually highlights one of the bigger flaws in littlefs right now,
which is that this function, lfs_dir_traverse, takes O(n^2) disk reads
to traverse.
Note that LFS_FROM_MOVE can only occur once per commit, which is why
this code is O(n^2) and not O(n^4).
This mostly just required separate functions for "lfs_file_rawwrite" and
"lfs_file_flushedwrite", since lfs_file_flush recursively invokes
lfs_file_rawread and lfs_file_rawwrite.
This comes at a code cost, but gives us bounded and measurable RAM usage
on this code path.
lfs_dir_commit originally relied heavily on tail-recursion, though at
least one path (through relocations) was not tail-recursive, and could
cause unbounded stack usage in extreme cases of bad blocks. (Keep in
mind even extreme cases of bad blocks should be in scope for littlefs).
In order to remove recursion from this code path, several changed were
raequired:
- The lfs_dir_compact logic had to be somewhat inverted. Instead of
first compacting and then resolving issues such as relocations and
orphans, the overarching lfs_dir_commit now contains a state-machine
which after committing or compacting handles the extra changes to the
filesystem in a single, non-recursive loop
- Instead of fixing all relocations recursively, >1 relocation requires
defering to a full deorphan step. This step is unfortunately an
additional n^2 process. It also required some changes to lfs_deorphan
in order to ignore intentional orphans created as an intermediary in
lfs_mkdir. Maybe in the future we should remove these.
- Tail recursion normally found in lfs_fs_deorphan had to be rewritten
as a loop which restarts any time a new commit causes a relocation.
This does show that the algorithm may not terminate, but only if every
block is bad, which will eventually cause littlefs to run out of
blocks to write to.
As noted in Python's subprocess library:
> This will deadlock when using stdout=PIPE and/or stderr=PIPE and the
> child process generates enough output to a pipe such that it blocks
> waiting for the OS pipe buffer to accept more data.
Curiously, this only became a problem when updating to Ubuntu 20.04
in CI (python3.6 -> python3.8).
Avoids redundant counting of structs shared in multiple .c files, which
is very common. This is different from the other scripts,
code.py/data.py/stack.py, but this difference makes sense as struct
declarations have a very different lifetime.
Introduced when updating CI to Ubuntu 20.04, Ubuntu snaps consume
loop devices, which conflict with out assumption that /dev/loop0
will always be unused. Changed to request a dynamic loop device from
losetup, though it would have been nice if Ubuntu snaps allocated
from the last device or something.
This requires parsing an additional section of the dwarfinfo (--dwarf=rawlines)
to get the declaration file info.
---
Interpreting the results of ./scripts/structs.py reporting is a bit more
complicated than other scripts, structs aren't used in a consistent
manner so the cost of a large struct depends on the context in which it
is used.
But that being said, there really isn't much reason to report
internal-only structs. These structs really only exist for type-checking
in internal algorithms, and their cost will end up reflected in other RAM
measurements, either stack, heap, or other.