opensourcemirror/git - Forgejo: Beyond coding. We forge.

Commit Graph

Author	SHA1	Message	Date
Jeff Hostetler	b316552339	name-hash: add index_dir_find() index_dir_exists() returns a boolean to indicate if there is a case-insensitive match in the directory name-hash, but does not provide the caller with the exact spelling of that match. Create index_dir_find() to do the case-insensitive search and optionally return the spelling of the matched directory prefix in a provided strbuf. To avoid code duplication, convert index_dir_exists() to be a trivial wrapper around the new index_dir_find(). Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-26 15:34:01 -08:00
Elijah Newren	bc5c5ec044	cache.h: remove this no-longer-used header Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	08c46a499a	read-cache*.h: move declarations for read-cache.c functions from cache.h For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	f5653856c2	name-hash.h: move declarations for name-hash.c from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	74ea5c9574	treewide: be explicit about dependence on trace.h & trace2.h Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:08 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in `2174b8c75d` (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Jeff King	02c3c59e62	hashmap: mark unused callback parameters Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:55 -07:00
Junio C Hamano	8e97852919	Merge branch 'ds/sparse-index-protections' Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...	2021-04-30 13:50:26 +09:00
Derrick Stolee	4589bca829	name-hash: use expand_to_path() A sparse-index loads the name-hash data for its entries, including the sparse-directory entries. If a caller asks for a path that is contained within a sparse-directory entry, we need to expand to a full index and recalculate the name hash table before returning the result. Insert calls to expand_to_path() to protect against this case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:48:01 -07:00
Derrick Stolee	5f11669586	name-hash: don't add directories to name_hash Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:47:51 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Derrick Stolee	6a9372f4ef	name-hash: use trace2 regions for init The lazy_init_name_hash() populates a hashset with all filenames and another with all directories represented in the index. This is run only if we need to use the hashsets to check for existence or case-folding renames. Place trace2 regions where there is already a performance trace. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Elijah Newren	6da1a25814	hashmap: provide deallocation function names hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Junio C Hamano	d3096d2ba6	Merge branch 'en/doc-typofix' Docfix. * en/doc-typofix: Fix spelling errors in no-longer-updated-from-upstream modules multimail: fix a few simple spelling errors sha1dc: fix trivial comment spelling error Fix spelling errors in test commands Fix spelling errors in messages shown to users Fix spelling errors in names of tests Fix spelling errors in comments of testcases Fix spelling errors in code comments Fix spelling errors in documentation outside of Documentation/ Documentation: fix a bunch of typos, both old and new	2019-12-01 09:04:35 -08:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
Elijah Newren	77363a51fb	name-hash.c: remove duplicate word in comment Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-07 14:41:00 +09:00
Eric Wong	404ab78e39	hashmap: remove type arg from hashmap_{get,put,remove}_entry Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:12 +09:00
Eric Wong	23dee69f53	OFFSETOF_VAR macro to simplify hashmap iterators While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	c8e424c9c9	hashmap: introduce hashmap_free_entries `hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	939af16eac	hashmap_cmp_fn takes hashmap_entry params Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f23a465132	hashmap_get{,_from_hash} return "struct hashmap_entry *" Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f0e63c4113	hashmap: use _entry APIs to wrap container_of Using `container_of' can be verbose and choosing names for intermediate "struct hashmap_entry" pointers is a hard problem. So introduce "_entry" APIs inspired by similar linked-list APIs in the Linux kernel. Unfortunately, `__typeof__' is not portable C, so we need an extra parameter to specify the type. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	6bcbdfb277	hashmap_get_next returns "struct hashmap_entry *" This is a step towards removing the requirement for hashmap_entry being the first field of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	28ee794128	hashmap_remove takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b6c5241606	hashmap_get takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b94e5c1df6	hashmap_add takes "struct hashmap_entry " This is less error-prone than "void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	f6eb6bdcf2	hashmap_get_next takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	d22245a2e3	hashmap_entry_init takes "struct hashmap_entry " C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry " to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
René Scharfe	568a05c5ec	cleanup: fix possible overflow errors in binary search, part 2 Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.+.) >> 1' This is a continuation of `19716b21a4` (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-13 11:28:53 -07:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00
Nguyễn Thái Ngọc Duy	2179045fd0	Clean up pthread_create() error handling Normally pthread_create() rarely fails. But with new pthreads wrapper, pthread_create() will return ENOSYS on a system without thread support. Threaded code _is_ protected by HAVE_THREADS and pthread_create() should never run in the first place. But the situation could change in the future and bugs may sneak in. Make sure that all pthread_create() reports the error cause. While at there, mark these strings for translation if they aren't. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-05 13:42:11 +09:00
Nguyễn Thái Ngọc Duy	07642942aa	name-hash.c: remove #ifdef NO_PTHREADS Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-05 13:42:11 +09:00
Nguyễn Thái Ngọc Duy	c46c406ae1	trace.h: support nested performance tracing Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Junio C Hamano	2ac76d8d9d	Merge branch 'bp/name-hash-dirname-fix' "git add" files in the same directory, but spelling the directory path in different cases on case insensitive filesystem, corrupted the name hash data structure and led to unexpected results. This has been corrected. * bp/name-hash-dirname-fix: name-hash: properly fold directory names in adjust_dirname_case()	2018-02-27 10:33:53 -08:00
Ben Peart	c95525e90d	name-hash: properly fold directory names in adjust_dirname_case() Correct the pointer arithmetic in adjust_dirname_case() so that it calls find_dir_entry() with the correct string length. Previously passing in "dir1/foo" would pass a length of 6 instead of the correct 4. This resulted in find_dir_entry() never finding the entry and so the subsequent memcpy that would fold the name to the version with the correct case never executed. Add a test to validate the corrected behavior with name folding of directories. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-08 12:20:56 -08:00
Nguyễn Thái Ngọc Duy	ca54d9baa4	trace: measure where the time is spent in the index-heavy operations All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-02 11:20:16 -08:00
Jeff Hostetler	8b604d1951	hashmap: add API to disable item counting when threaded This is to address concerns raised by ThreadSanitizer on the mailing list about threaded unprotected R/W access to map.size with my previous "disallow rehash" change (`0607e10009`). See: https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/ Add API to hashmap to disable item counting and thus automatic rehashing. Also include API to later re-enable them. When item counting is disabled, the map.size field is invalid. So to prevent accidents, the field has been renamed and an accessor function hashmap_get_size() has been added. All direct references to this field have been been updated. And the name of the field changed to map.private_size to communicate this. Here is the relevant output from ThreadSanitizer showing the problem: WARNING: ThreadSanitizer: data race (pid=10554) Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16): #0 hashmap_add hashmap.c:209 #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302 #2 handle_range_dir name-hash.c:347 #3 handle_range_1 name-hash.c:415 #4 lazy_dir_thread_proc name-hash.c:471 #5 <null> <null> Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31): #0 hashmap_add hashmap.c:209 #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302 #2 handle_range_dir name-hash.c:347 #3 handle_range_1 name-hash.c:415 #4 handle_range_dir name-hash.c:380 #5 handle_range_1 name-hash.c:415 #6 lazy_dir_thread_proc name-hash.c:471 #7 <null> <null> Martin gives instructions for running TSan on test t3008 in this post: https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/ Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-07 09:42:02 +09:00
Stefan Beller	56a14ea7ac	name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-05 13:53:12 -07:00
Stefan Beller	7663cdc86c	hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 12:49:28 -07:00
Kevin Willford	2a1bd45b2e	name-hash: fix buffer overrun Add check for the end of the entries for the thread partition. Add test for lazy init name hash with specific directory structure The lazy init hash name was causing a buffer overflow when the last entry in the index was multiple folder deep with parent folders that did not have any files in them. This adds a test for the boundary condition of the thread partitions with the folder structure that was triggering the buffer overflow. The fix was to check if it is the last entry for the thread partition in the handle_range_dir and not try to use the next entry in the cache. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 20:57:18 -07:00
Jeff Hostetler	846df809bc	name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-24 11:00:03 -07:00
Jeff Hostetler	16f9b4cdd3	name-hash: specify initial size for istate.dir_hash table Specify an initial size for the istate.dir_hash HashMap matching the size of the istate.name_hash. Previously hashmap_init() was given 0, causing a 64 bucket hashmap to be created. When working with very large repositories, this would cause numerous rehash() calls to realloc and rebalance the hashmap. This is especially true when the worktree is deep, with many directories containing a few files. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-22 13:41:41 -07:00
Jeff King	96ffc06f72	convert trivial cases to FLEX_ARRAY macros Using FLEX_ARRAY macros reduces the amount of manual computation size we have to do. It also ensures we don't overflow size_t, and it makes sure we write the same number of bytes that we allocated. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
David Turner	41284eb0f9	name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-21 12:47:38 -07:00
Karsten Blees	ab73a9d119	hashmap: add simplified hashmap_get_from_hash() API Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-07 13:56:35 -07:00
Jeremiah Mahler	be99ec97c8	name-hash.c: replace cache_name_compare() with memcmp(3) The same_name() private function wants a quick-and-exact check to see if they two names are byte-for-byte identical first and then fall back to the slow path. Use memcmp(3) for the former to make it clear that we do not want any "name" specific comparison. Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-20 10:08:10 -07:00
Eric Sunshine	7b359ea6b3	name-hash: retire unused index_name_exists() `db5360f3f4` (name-hash: refactor polymorphic index_name_exists(); 2013-09-17) split index_name_exists() into index_file_exists() and index_dir_exists() but retained index_name_exists() as a thin wrapper to avoid disturbing possible in-flight topics. Since this change landed in 'master' some time ago and there are no in-flight topics referencing index_name_exists(), retire it. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 15:26:33 -08:00
Karsten Blees	419a597f64	name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-18 13:04:24 -08:00

1 2

66 Commits