Merge branch 'en/ort-perf-batch-11'

Optimize out repeated rename detection in a sequence of mergy
operations.

* en/ort-perf-batch-11:
  merge-ort, diffcore-rename: employ cached renames when possible
  merge-ort: handle interactions of caching and rename/rename(1to1) cases
  merge-ort: add helper functions for using cached renames
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: avoid accidental API mis-use
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: populate caches of rename detection results
  merge-ort: add data structures for in-memory caching of rename detection
  t6429: testcases for remembering renames
  fast-rebase: write conflict state to working tree, index, and HEAD
  fast-rebase: change assert() to BUG()
  Documentation/technical: describe remembering renames optimization
  t6423: rename file within directory that other side renamed
This commit is contained in:
Junio C Hamano 2021-06-14 13:33:26 +09:00
commit 169914ede2
8 changed files with 1804 additions and 37 deletions

View File

@ -0,0 +1,671 @@
Rebases and cherry-picks involve a sequence of merges whose results are
recorded as new single-parent commits. The first parent side of those
merges represent the "upstream" side, and often include a far larger set of
changes than the second parent side. Traditionally, the renames on the
first-parent side of that sequence of merges were repeatedly re-detected
for every merge. This file explains why it is safe and effective during
rebases and cherry-picks to remember renames on the upstream side of
history as an optimization, assuming all merges are automatic and clean
(i.e. no conflicts and not interrupted for user input or editing).
Outline:
0. Assumptions
1. How rebasing and cherry-picking work
2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
superset of the renames on MERGE_SIDE1 for the next pick.
3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
a rename on MERGE_SIDE1 for the next pick
4. A detailed description of the the counter-examples to #3.
5. Why the special cases in #4 are still fully reasonable to use to pair
up files for three-way content merging in the merge machinery, and why
they do not affect the correctness of the merge.
6. Interaction with skipping of "irrelevant" renames
7. Additional items that need to be cached
8. How directory rename detection interacts with the above and why this
optimization is still safe even if merge.directoryRenames is set to
"true".
=== 0. Assumptions ===
There are two assumptions that will hold throughout this document:
* The upstream side where commits are transplanted to is treated as the
first parent side when rebase/cherry-pick call the merge machinery
* All merges are fully automatic
and a third that will hold in sections 2-5 for simplicity, that I'll later
address in section 8:
* No directory renames occur
Let me explain more about each assumption and why I include it:
The first assumption is merely for the purposes of making this document
clearer; the optimization implementation does not actually depend upon it.
However, the assumption does hold in all cases because it reflects the way
that both rebase and cherry-pick were implemented; and the implementation
of cherry-pick and rebase are not readily changeable for backwards
compatibility reasons (see for example the discussion of the --ours and
--theirs flag in the documentation of `git checkout`, particularly the
comments about how they behave with rebase). The optimization avoids
checking first-parent-ness, though. It checks the conditions that make the
optimization valid instead, so it would still continue working if someone
changed the parent ordering that cherry-pick and rebase use. But making
this assumption does make this document much clearer and prevents me from
having to repeat every example twice.
If the second assumption is violated, then the optimization simply is
turned off and thus isn't relevant to consider. The second assumption can
also be stated as "there is no interruption for a user to resolve conflicts
or to just further edit or tweak files". While real rebases and
cherry-picks are often interrupted (either because it's an interactive
rebase where the user requested to stop and edit, or because there were
conflicts that the user needs to resolve), the cache of renames is not
stored on disk, and thus is thrown away as soon as the rebase or cherry
pick stops for the user to resolve the operation.
The third assumption makes sections 2-5 simpler, and allows people to
understand the basics of why this optimization is safe and effective, and
then I can go back and address the specifics in section 8. It is probably
also worth noting that if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation
will stop for users to resolve the conflicts and the cache will be thrown
away, and thus that there won't be an optimization to apply. So, the only
reason we need to address directory renames specifically, is that some
users will have set merge.directoryRenames to "true" to allow the merges to
continue to proceed automatically. The optimization is still safe with
this config setting, but we have to discuss a few more cases to show why;
this discussion is deferred until section 8.
=== 1. How rebasing and cherry-picking work ===
Consider the following setup (from the git-rebase manpage):
A---B---C topic
/
D---E---F---G main
After rebasing or cherry-picking topic onto main, this will appear as:
A'--B'--C' topic
/
D---E---F---G main
The way the commits A', B', and C' are created is through a series of
merges, where rebase or cherry-pick sequentially uses each of the three
A-B-C commits in a special merge operation. Let's label the three commits
in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
this picture, the three commits for each of the three merges would be:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
To create C':
MERGE_BASE: B
MERGE_SIDE1: B'
MERGE_SIDE2: C
Sometimes, folks are surprised that these three-way merges are done. It
can be useful in understanding these three-way merges to view them in a
slightly different light. For example, in creating C', you can view it as
either:
* Apply the changes between B & C to B'
* Apply the changes between B & B' to C
Conceptually the two statements above are the same as a three-way merge of
B, B', and C, at least the parts before you decide to record a commit.
=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
=== superset of the renames on MERGE_SIDE1 for the next pick. ===
The merge machinery uses the filenames it is fed from MERGE_BASE,
MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
filename under one of three conditions:
* To make both pieces of a conflict available to a user during conflict
resolution (examples: directory/file conflict, add/add type conflict
such as symlink vs. regular file)
* When MERGE_SIDE1 renames the file.
* When MERGE_SIDE2 renames the file.
First, let's remember what commits are involved in the first and second
picks of the cherry-pick or rebase sequence:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
So, in particular, we need to show that the renames between E and G are a
superset of those between A and A'.
A' is created by the first merge. A' will only have renames for one of the
three reasons listed above. The first case, a conflict, results in a
situation where the cache is dropped and thus this optimization doesn't
take effect, so we need not consider that case. The third case, a rename
on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
in A -- therefore when diffing A and A' that path does not show up as a
rename. The only remaining way for renames to show up in A' is for the
rename to come from MERGE_SIDE1. Therefore, all renames between A and A'
are a subset of those between E and G. Equivalently, all renames between E
and G are a superset of those between A and A'.
=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ ===
=== always also a rename on MERGE_SIDE1 for the next pick. ===
Let's again look at the first two picks:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
any given rename from E to G. Let's use the filenames 'oldfile' and
'newfile' for demonstration purposes. That first pick will function as
follows; when the rename is detected, the merge machinery will do a
three-way content merge of the following:
E:oldfile
G:newfile
A:oldfile
and produce a new result:
A':newfile
Note above that I've assumed that E->A did not rename oldfile. If that
side did rename, then we most likely have a rename/rename(1to2) conflict
that will cause the rebase or cherry-pick operation to halt and drop the
in-memory cache of renames and thus doesn't need to be considered further.
In the special case that E->A does rename the file but also renames it to
newfile, then there is no conflict from the renaming and the merge can
succeed. In this special case, the rename is not valid to cache because
the second merge will find A:newfile in the MERGE_BASE (see also the new
testcases in t6429 with "rename same file identically" in their
description). So a rename/rename(1to1) needs to be specially handled by
pruning renames from the cache and decrementing the dir_rename_counts in
the current and leading directories associated with those renames. Or,
since these are really rare, one could just take the easy way out and
disable the remembering renames optimization when a rename/rename(1to1)
happens.
The previous paragraph handled the cases for E->A renaming oldfile, let's
continue assuming that oldfile is not renamed in A.
As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
to A'. So, we are curious whether A:oldfile and A':newfile will be viewed
as renames. Note that:
* There will be no A':oldfile (because there could not have been a
G:oldfile as we do not do break detection in the merge machinery and
G:newfile was detected as a rename, and by the construction of the
rename above that merged cleanly, the merge machinery will ensure there
is no 'oldfile' in the result).
* There will be no A:newfile (if there had been, we would have had a
rename/add conflict).
* Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
clean three-way content merge involving A:oldfile).
We can also expound on the third point above, by noting that three-way
content merges can also be viewed as applying the differences between the
base and one side to the other side. Thus we can view A':newfile as
having been created by taking the changes between E:oldfile and G:newfile
(which were detected as being related, i.e. <50% changed) to A:oldfile.
Thus A:oldfile and A':newfile are just as related as E:oldfile and
G:newfile are -- they have exactly identical differences. Since the latter
were detected as renames, A:oldfile and A':newfile should also be
detectable as renames almost always.
=== 4. A detailed description of the counter-examples to #3. ===
We already noted in section 3 that rename/rename(1to1) (i.e. both sides
renaming a file the same way) was one counter-example. The more
interesting bit, though, is why did we need to use the "almost" qualifier
when stating that A:oldfile and A':newfile are "almost" always detectable
as renames?
Let's repeat an earlier point that section 3 made:
A':newfile was created by applying the changes between E:oldfile and
G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
<50% of the size of E:oldfile.
If those changes that were <50% of the size of E:oldfile are also <50% of
the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
renames. However, if there is a dramatic size reduction between E:oldfile
and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
still somehow merge cleanly), then traditional rename detection would not
detect A:oldfile and A':newfile as renames.
Here's an example where that can happen:
* E:oldfile had 20 lines
* G:newfile added 10 new lines at the beginning of the file
* A:oldfile kept the first 3 lines of the file, and deleted all the rest
then
=> A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
A':newfile would not be.
=== 5. Why the special cases in #4 are still fully reasonable to use to ===
=== pair up files for three-way content merging in the merge machinery, ===
=== and why they do not affect the correctness of the merge. ===
In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
since they use the *same* filename. However, files with the same filename
are obviously fine to pair up for three-way content merging (the merge
machinery has never employed break detection). The interesting
counter-example case is thus not the rename/rename(1to1) case, but the case
where A did not rename oldfile. That was the case that we spent most of
the time discussing in sections 3 and 4. The remainder of this section
will be devoted to that case as well.
So, even if A:oldfile and A':newfile aren't detectable as renames, why is
it still reasonable to pair them up for three-way content merging in the
merge machinery? There are multiple reasons:
* As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
is *exactly* the same as the diff between E:oldfile and G:newfile. The
latter pair were detected as renames, so it seems unlikely to surprise
users for us to treat A:oldfile and A':newfile as renames.
* In fact, "oldfile" and "newfile" were at one point detected as renames
due to how they were constructed in the E..G chain. And we used that
information once already in this rebase/cherry-pick. I think users
would be unlikely to be surprised at us continuing to treat the files
as renames and would quickly understand why we had done so.
* Marking or declaring files as renames is *not* the end goal for merges.
Merges use renames to determine which files make sense to be paired up
for three-way content merges.
* A:oldfile and A':newfile were _already_ paired up in a three-way
content merge; that is how A':newfile was created. In fact, that
three-way content merge was clean. So using them again in a later
three-way content merge seems very reasonable.
However, the above is focusing on the common scenarios. Let's try to look
at all possible unusual scenarios and compare without the optimization to
with the optimization. Consider the following theoretical cases; we will
then dive into each to determine which of them are possible,
and if so, what they mean:
1. Without the optimization, the second merge results in a conflict.
With the optimization, the second merge also results in a conflict.
Questions: Are the conflicts confusingly different? Better in one case?
2. Without the optimization, the second merge results in NO conflict.
With the optimization, the second merge also results in NO conflict.
Questions: Are the merges the same?
3. Without the optimization, the second merge results in a conflict.
With the optimization, the second merge results in NO conflict.
Questions: Possible? Bug, bugfix, or something else?
4. Without the optimization, the second merge results in NO conflict.
With the optimization, the second merge results in a conflict.
Questions: Possible? Bug, bugfix, or something else?
I'll consider all four cases, but out of order.
The fourth case is impossible. For the code without the remembering
renames optimization to not get a conflict, B:oldfile would need to exactly
match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
If A:oldfile matches B:oldfile exactly, then a three-way content merge
between A:oldfile, A':newfile, and B:oldfile would have no conflict and
just give us the version of newfile from A' as the result.
From the same logic as the above paragraph, the second case would indeed
result in identical merges. When A:oldfile exactly matches B:oldfile, an
undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
and the other side deleted it. I'll delete it. And I see you have this
brand new file named 'newfile' in A', so I'll keep it." That gives the
same results as three-way content merging A:oldfile, A':newfile, and
B:oldfile -- a removal of oldfile with the version of newfile from A'
showing up in the result.
The third case is interesting. It means that A:oldfile and A':newfile were
not just similar enough, but that the changes between them did not conflict
with the changes between A:oldfile and B:oldfile. This would validate our
hunch that the files were similar enough to be used in a three-way content
merge, and thus seems entirely correct for us to have used them that way.
(Sidenote: One particular example here may be enlightening. Let's say that
B was an immediate revert of A. B clearly would have been a clean revert
of A, since A was B's immediate parent. One would assume that if you can
pick a commit, you should also be able to cherry-pick its immediate revert.
However, this is one of those funny corner cases; without this
optimization, we just successfully picked a commit cleanly, but we are
unable to cherry-pick its immediate revert due to the size differences
between E:oldfile and A:oldfile.)
That leaves only the first case to consider -- when we get conflicts both
with or without the optimization. Without the optimization, we'll have a
modify/delete conflict, where both A':newfile and B:oldfile are left in the
tree for the user to deal with and no hints about the potential similarity
between the two. With the optimization, we'll have a three-way content
merged A:oldfile, A':newfile, and B:oldfile with conflict markers
suggesting we thought the files were related but giving the user the chance
to resolve. As noted above, I don't think users will find us treating
'oldfile' and 'newfile' as related as a surprise since they were between E
and G. In any event, though, this case shouldn't be concerning since we
hit a conflict in both cases, told the user what we know, and asked them to
resolve it.
So, in summary, case 4 is impossible, case 2 yields the same behavior, and
cases 1 and 3 seem to provide as good or better behavior with the
optimization than without.
=== 6. Interaction with skipping of "irrelevant" renames ===
Previous optimizations involved skipping rename detection for paths
considered to be "irrelevant". See for example the following commits:
* 32a56dfb99 ("merge-ort: precompute subset of sources for which we
need rename detection", 2021-03-11)
* 2fd9eda462 ("merge-ort: precompute whether directory rename
detection is needed", 2021-03-11)
* 9bd342137e ("diffcore-rename: determine which relevant_sources are
no longer relevant", 2021-03-13)
Relevance is always determined by what the _other_ side of history has
done, in terms of modifing a file that our side renamed, or adding a
file to a directory which our side renamed. This means that a path
that is "irrelevant" when picking the first commit of a series in a
rebase or cherry-pick, may suddenly become "relevant" when picking the
next commit.
The upshot of this is that we can only cache rename detection results
for relevant paths, and need to re-check relevance in subsequent
commits. If those subsequent commits have additional paths that are
relevant for rename detection, then we will need to redo rename
detection -- though we can limit it to the paths for which we have not
already detected renames.
=== 7. Additional items that need to be cached ===
It turns out we have to cache more than just renames; we also cache:
A) non-renames (i.e. unpaired deletes)
B) counts of renames within directories
C) sources that were marked as RELEVANT_LOCATION, but which were
downgraded to RELEVANT_NO_MORE
D) the toplevel trees involved in the merge
These are all stored in struct rename_info, and respectively appear in
* cached_pairs (along side actual renames, just with a value of NULL)
* dir_rename_counts
* cached_irrelevant
* merge_trees
The reason for (A) comes from the irrelevant renames skipping
optimization discussed in section 6. The fact that irrelevant renames
are skipped means we only get a subset of the potential renames
detected and subsequent commits may need to run rename detection on
the upstream side on a subset of the remaining renames (to get the
renames that are relevant for that later commit). Since unpaired
deletes are involved in rename detection too, we don't want to
repeatedly check that those paths remain unpaired on the upstream side
with every commit we are transplanting.
The reason for (B) is that diffcore_rename_extended() is what
generates the counts of renames by directory which is needed in
directory rename detection, and if we don't run
diffcore_rename_extended() again then we need to have the output from
it, including dir_rename_counts, from the previous run.
The reason for (C) is that merge-ort's tree traversal will again think
those paths are relevant (marking them as RELEVANT_LOCATION), but the
fact that they were downgraded to RELEVANT_NO_MORE means that
dir_rename_counts already has the information we need for directory
rename detection. (A path which becomes RELEVANT_CONTENT in a
subsequent commit will be removed from cached_irrelevant.)
The reason for (D) is that is how we determine whether the remember
renames optimization can be used. In particular, remembering that our
sequence of merges looks like:
Merge 1:
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
=> Creates A'
Merge 2:
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
=> Creates B'
It is the fact that the trees A and A' appear both in Merge 1 and in
Merge 2, with A as a parent of A' that allows this optimization. So
we store the trees to compare with what we are asked to merge next
time.
=== 8. How directory rename detection interacts with the above and ===
=== why this optimization is still safe even if ===
=== merge.directoryRenames is set to "true". ===
As noted in the assumptions section:
"""
...if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation
will stop for users to resolve the conflicts and the cache will be
thrown away, and thus that there won't be an optimization to apply.
So, the only reason we need to address directory renames specifically,
is that some users will have set merge.directoryRenames to "true" to
allow the merges to continue to proceed automatically.
"""
Let's remember that we need to look at how any given pick affects the next
one. So let's again use the first two picks from the diagram in section
one:
First pick does this three-way merge:
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
=> creates A'
Second pick does this three-way merge:
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
=> creates B'
Now, directory rename detection exists so that if one side of history
renames a directory, and the other side adds a new file to the old
directory, then the merge (with merge.directoryRenames=true) can move the
file into the new directory. There are two qualitatively different ways to
add a new file to an old directory: create a new file, or rename a file
into that directory. Also, directory renames can be done on either side of
history, so there are four cases to consider:
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
* MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
* MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
One last note before we consider these four cases: There are some
important properties about how we implement this optimization with
respect to directory rename detection that we need to bear in mind
while considering all of these cases:
* rename caching occurs *after* applying directory renames
* a rename created by directory rename detection is recorded for the side
of history that did the directory rename.
* dir_rename_counts, the nested map of
{oldname => {newname => count}},
is cached between runs as well. This basically means that directory
rename detection is also cached, though only on the side of history
that we cache renames for (MERGE_SIDE1 as far as this document is
concerned; see the assumptions section). Two interesting sub-notes
about these counts:
* If we need to perform rename-detection again on the given side (e.g.
some paths are relevant for rename detection that weren't before),
then we clear dir_rename_counts and recompute it, making use of
cached_pairs. The reason it is important to do this is optimizations
around RELEVANT_LOCATION exist to prevent us from computing
unnecessary renames for directory rename detection and from computing
dir_rename_counts for irrelevant directories; but those same renames
or directories may become necessary for subsequent merges. The
easiest way to "fix up" dir_rename_counts in such cases is to just
recompute it.
* If we prune rename/rename(1to1) entries from the cache, then we also
need to update dir_rename_counts to decrement the counts for the
involved directory and any relevant parent directories (to undo what
update_dir_rename_counts() in diffcore-rename.c incremented when the
rename was initially found). If we instead just disable the
remembering renames optimization when the exceedingly rare
rename/rename(1to1) cases occur, then dir_rename_counts will get
re-computed the next time rename detection occurs, as noted above.
* the side with multiple commits to pick, is the side of history that we
do NOT cache renames for. Thus, there are no additional commits to
change the number of renames in a directory, except for those done by
directory rename detection (which always pad the majority).
* the "renames" we cache are modified slightly by any directory rename,
as noted below.
Now, with those notes out of the way, let's go through the four cases
in order:
Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Renames olddir/ -> newdir/
MERGE_SIDE2: A, Adds olddir/newfile
=> creates A', With newdir/newfile
MERGE_BASE: A, Has olddir/newfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Modifies olddir/newfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers olddir/ -> newdir/
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
This case looks like this:
MERGE_BASE: E oldfile, olddir/
MERGE_SIDE1: G oldfile, olddir/ -> newdir/
MERGE_SIDE2: A oldfile -> olddir/newfile
=> creates A', With newdir/newfile representing original oldfile
MERGE_BASE: A olddir/newfile
MERGE_SIDE1: A' newdir/newfile
MERGE_SIDE2: B modify olddir/newfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers olddir/ -> newdir/
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
(NOT oldfile -> newdir/newfile; compare to case with
(p->status == 'R' && new_path) in possibly_cache_new_pair())
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Adds olddir/newfile
MERGE_SIDE2: A, Renames olddir/ -> newdir/
=> creates A', With newdir/newfile
MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile
=> expected B', with newdir/newfile from A'
In this case, with the optimization, note that after the first commit there
were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
But the second merge didn't need any renames so this is fine.
Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Renames oldfile -> olddir/newfile
MERGE_SIDE2: A, Renames olddir/ -> newdir/
=> creates A', With newdir/newfile representing original oldfile
MERGE_BASE: A, Has oldfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Modifies oldfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers oldfile -> newdir/newfile
(NOT oldfile -> olddir/newfile; compare to case of second
block under p->status == 'R' in possibly_cache_new_pair())
* MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Finally, I'll just note here that interactions with the
skip-irrelevant-renames optimization means we sometimes don't detect
renames for any files within a directory that was renamed, in which
case we will not have been able to detect any rename for the directory
itself. In such a case, we do not know whether the directory was
renamed; we want to be careful to avoid cacheing some kind of "this
directory was not renamed" statement. If we did, then a subsequent
commit being rebased could add a file to the old directory, and the
user would expect it to end up in the correct directory -- something
our erroneous "this directory was not renamed" cache would preclude.

View File

@ -568,7 +568,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info,
static void initialize_dir_rename_info(struct dir_rename_info *info,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
struct strmap *dir_rename_count)
struct strmap *dir_rename_count,
struct strmap *cached_pairs)
{
struct hashmap_iter iter;
struct strmap_entry *entry;
@ -633,6 +634,17 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
rename_dst[i].p->two->path);
}
/* Add cached_pairs to counts */
strmap_for_each_entry(cached_pairs, &iter, entry) {
const char *old_name = entry->key;
const char *new_name = entry->value;
if (!new_name)
/* known delete; ignore it */
continue;
update_dir_rename_counts(info, dirs_removed, old_name, new_name);
}
/*
* Now we collapse
* dir_rename_count: old_directory -> {new_directory -> count}
@ -1247,7 +1259,8 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
void diffcore_rename_extended(struct diff_options *options,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
struct strmap *dir_rename_count)
struct strmap *dir_rename_count,
struct strmap *cached_pairs)
{
int detect_rename = options->detect_rename;
int minimum_score = options->rename_score;
@ -1363,7 +1376,8 @@ void diffcore_rename_extended(struct diff_options *options,
/* Preparation for basename-driven matching. */
trace2_region_enter("diff", "dir rename setup", options->repo);
initialize_dir_rename_info(&info, relevant_sources,
dirs_removed, dir_rename_count);
dirs_removed, dir_rename_count,
cached_pairs);
trace2_region_leave("diff", "dir rename setup", options->repo);
/* Utilize file basenames to quickly find renames. */
@ -1560,5 +1574,5 @@ void diffcore_rename_extended(struct diff_options *options,
void diffcore_rename(struct diff_options *options)
{
diffcore_rename_extended(options, NULL, NULL, NULL);
diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
}

View File

@ -181,7 +181,8 @@ void diffcore_rename(struct diff_options *);
void diffcore_rename_extended(struct diff_options *options,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
struct strmap *dir_rename_count);
struct strmap *dir_rename_count,
struct strmap *cached_pairs);
void diffcore_merge_broken(void);
void diffcore_pickaxe(struct diff_options *);
void diffcore_order(const char *orderfile);

View File

@ -53,6 +53,8 @@ enum merge_side {
MERGE_SIDE2 = 2
};
static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */
struct traversal_callback_data {
unsigned long mask;
unsigned long dirmask;
@ -140,6 +142,72 @@ struct rename_info {
int callback_data_nr, callback_data_alloc;
char *callback_data_traverse_path;
/*
* merge_trees: trees passed to the merge algorithm for the merge
*
* merge_trees records the trees passed to the merge algorithm. But,
* this data also is stored in merge_result->priv. If a sequence of
* merges are being done (such as when cherry-picking or rebasing),
* the next merge can look at this and re-use information from
* previous merges under certain circumstances.
*
* See also all the cached_* variables.
*/
struct tree *merge_trees[3];
/*
* cached_pairs_valid_side: which side's cached info can be reused
*
* See the description for merge_trees. For repeated merges, at most
* only one side's cached information can be used. Valid values:
* MERGE_SIDE2: cached data from side2 can be reused
* MERGE_SIDE1: cached data from side1 can be reused
* 0: no cached data can be reused
*/
int cached_pairs_valid_side;
/*
* cached_pairs: Caching of renames and deletions.
*
* These are mappings recording renames and deletions of individual
* files (not directories). They are thus a map from an old
* filename to either NULL (for deletions) or a new filename (for
* renames).
*/
struct strmap cached_pairs[3];
/*
* cached_target_names: just the destinations from cached_pairs
*
* We sometimes want a fast lookup to determine if a given filename
* is one of the destinations in cached_pairs. cached_target_names
* is thus duplicative information, but it provides a fast lookup.
*/
struct strset cached_target_names[3];
/*
* cached_irrelevant: Caching of rename_sources that aren't relevant.
*
* If we try to detect a rename for a source path and succeed, it's
* part of a rename. If we try to detect a rename for a source path
* and fail, then it's a delete. If we do not try to detect a rename
* for a path, then we don't know if it's a rename or a delete. If
* merge-ort doesn't think the path is relevant, then we just won't
* cache anything for that path. But there's a slight problem in
* that merge-ort can think a path is RELEVANT_LOCATION, but due to
* commit 9bd342137e ("diffcore-rename: determine which
* relevant_sources are no longer relevant", 2021-03-13),
* diffcore-rename can downgrade the path to RELEVANT_NO_MORE. To
* avoid excessive calls to diffcore_rename_extended() we still need
* to cache such paths, though we cannot record them as either
* renames or deletes. So we cache them here as a "turned out to be
* irrelevant *for this commit*" as they are often also irrelevant
* for subsequent commits, though we will have to do some extra
* checking to see whether such paths become relevant for rename
* detection when cherry-picking/rebasing subsequent commits.
*/
struct strset cached_irrelevant[3];
/*
* needed_limit: value needed for inexact rename detection to run
*
@ -382,6 +450,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
reinitialize ? strmap_partial_clear : strmap_clear;
void (*strintmap_func)(struct strintmap *) =
reinitialize ? strintmap_partial_clear : strintmap_clear;
void (*strset_func)(struct strset *) =
reinitialize ? strset_partial_clear : strset_clear;
/*
* We marked opti->paths with strdup_strings = 0, so that we
@ -417,15 +487,21 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
/* Free memory used by various renames maps */
for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
strintmap_func(&renames->dirs_removed[i]);
partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
if (!reinitialize)
strmap_clear(&renames->dir_rename_count[i], 1);
strmap_func(&renames->dir_renames[i], 0);
strintmap_func(&renames->relevant_sources[i]);
if (!reinitialize)
assert(renames->cached_pairs_valid_side == 0);
if (i != renames->cached_pairs_valid_side) {
strset_func(&renames->cached_target_names[i]);
strmap_func(&renames->cached_pairs[i], 1);
strset_func(&renames->cached_irrelevant[i]);
partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
if (!reinitialize)
strmap_clear(&renames->dir_rename_count[i], 1);
}
}
renames->cached_pairs_valid_side = 0;
renames->dir_rename_mask = 0;
if (!reinitialize) {
struct hashmap_iter iter;
@ -448,8 +524,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
strmap_clear(&opti->output, 0);
}
renames->dir_rename_mask = 0;
/* Clean out callback_data as well. */
FREE_AND_NULL(renames->callback_data);
renames->callback_data_nr = renames->callback_data_alloc = 0;
@ -690,15 +764,48 @@ static void add_pair(struct merge_options *opt,
struct rename_info *renames = &opt->priv->renames;
int names_idx = is_add ? side : 0;
if (!is_add) {
if (is_add) {
if (strset_contains(&renames->cached_target_names[side],
pathname))
return;
} else {
unsigned content_relevant = (match_mask == 0);
unsigned location_relevant = (dir_rename_mask == 0x07);
/*
* If pathname is found in cached_irrelevant[side] due to
* previous pick but for this commit content is relevant,
* then we need to remove it from cached_irrelevant.
*/
if (content_relevant)
/* strset_remove is no-op if strset doesn't have key */
strset_remove(&renames->cached_irrelevant[side],
pathname);
/*
* We do not need to re-detect renames for paths that we already
* know the pairing, i.e. for cached_pairs (or
* cached_irrelevant). However, handle_deferred_entries() needs
* to loop over the union of keys from relevant_sources[side] and
* cached_pairs[side], so for simplicity we set relevant_sources
* for all the cached_pairs too and then strip them back out in
* prune_cached_from_relevant() at the beginning of
* detect_regular_renames().
*/
if (content_relevant || location_relevant) {
/* content_relevant trumps location_relevant */
strintmap_set(&renames->relevant_sources[side], pathname,
content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION);
}
/*
* Avoid creating pair if we've already cached rename results.
* Note that we do this after setting relevant_sources[side]
* as noted in the comment above.
*/
if (strmap_contains(&renames->cached_pairs[side], pathname) ||
strset_contains(&renames->cached_irrelevant[side], pathname))
return;
}
one = alloc_filespec(pathname);
@ -2037,6 +2144,9 @@ static int process_renames(struct merge_options *opt,
VERIFY_CI(side2);
if (!strcmp(pathnames[1], pathnames[2])) {
struct rename_info *ri = &opt->priv->renames;
int j;
/* Both sides renamed the same way */
assert(side1 == side2);
memcpy(&side1->stages[0], &base->stages[0],
@ -2046,6 +2156,16 @@ static int process_renames(struct merge_options *opt,
base->merged.is_null = 1;
base->merged.clean = 1;
/*
* Disable remembering renames optimization;
* rename/rename(1to1) is incredibly rare, and
* just disabling the optimization is easier
* than purging cached_pairs,
* cached_target_names, and dir_rename_counts.
*/
for (j = 0; j < 3; j++)
ri->merge_trees[j] = NULL;
/* We handled both renames, i.e. i+1 handled */
i++;
/* Move to next rename */
@ -2273,7 +2393,9 @@ static inline int possible_side_renames(struct rename_info *renames,
static inline int possible_renames(struct rename_info *renames)
{
return possible_side_renames(renames, 1) ||
possible_side_renames(renames, 2);
possible_side_renames(renames, 2) ||
!strmap_empty(&renames->cached_pairs[1]) ||
!strmap_empty(&renames->cached_pairs[2]);
}
static void resolve_diffpair_statuses(struct diff_queue_struct *q)
@ -2297,6 +2419,112 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q)
}
}
static void prune_cached_from_relevant(struct rename_info *renames,
unsigned side)
{
/* Reason for this function described in add_pair() */
struct hashmap_iter iter;
struct strmap_entry *entry;
/* Remove from relevant_sources all entries in cached_pairs[side] */
strmap_for_each_entry(&renames->cached_pairs[side], &iter, entry) {
strintmap_remove(&renames->relevant_sources[side],
entry->key);
}
/* Remove from relevant_sources all entries in cached_irrelevant[side] */
strset_for_each_entry(&renames->cached_irrelevant[side], &iter, entry) {
strintmap_remove(&renames->relevant_sources[side],
entry->key);
}
}
static void use_cached_pairs(struct merge_options *opt,
struct strmap *cached_pairs,
struct diff_queue_struct *pairs)
{
struct hashmap_iter iter;
struct strmap_entry *entry;
/*
* Add to side_pairs all entries from renames->cached_pairs[side_index].
* (Info in cached_irrelevant[side_index] is not relevant here.)
*/
strmap_for_each_entry(cached_pairs, &iter, entry) {
struct diff_filespec *one, *two;
const char *old_name = entry->key;
const char *new_name = entry->value;
if (!new_name)
new_name = old_name;
/* We don't care about oid/mode, only filenames and status */
one = alloc_filespec(old_name);
two = alloc_filespec(new_name);
diff_queue(pairs, one, two);
pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
}
}
static void cache_new_pair(struct rename_info *renames,
int side,
char *old_path,
char *new_path,
int free_old_value)
{
char *old_value;
new_path = xstrdup(new_path);
old_value = strmap_put(&renames->cached_pairs[side],
old_path, new_path);
strset_add(&renames->cached_target_names[side], new_path);
if (free_old_value)
free(old_value);
else
assert(!old_value);
}
static void possibly_cache_new_pair(struct rename_info *renames,
struct diff_filepair *p,
unsigned side,
char *new_path)
{
int dir_renamed_side = 0;
if (new_path) {
/*
* Directory renames happen on the other side of history from
* the side that adds new files to the old directory.
*/
dir_renamed_side = 3 - side;
} else {
int val = strintmap_get(&renames->relevant_sources[side],
p->one->path);
if (val == RELEVANT_NO_MORE) {
assert(p->status == 'D');
strset_add(&renames->cached_irrelevant[side],
p->one->path);
}
if (val <= 0)
return;
}
if (p->status == 'D') {
/*
* If we already had this delete, we'll just set it's value
* to NULL again, so no harm.
*/
strmap_put(&renames->cached_pairs[side], p->one->path, NULL);
} else if (p->status == 'R') {
if (!new_path)
new_path = p->two->path;
else
cache_new_pair(renames, dir_renamed_side,
p->two->path, new_path, 0);
cache_new_pair(renames, side, p->one->path, new_path, 1);
} else if (p->status == 'A' && new_path) {
cache_new_pair(renames, dir_renamed_side,
p->two->path, new_path, 0);
}
}
static int compare_pairs(const void *a_, const void *b_)
{
const struct diff_filepair *a = *((const struct diff_filepair **)a_);
@ -2312,6 +2540,7 @@ static void detect_regular_renames(struct merge_options *opt,
struct diff_options diff_opts;
struct rename_info *renames = &opt->priv->renames;
prune_cached_from_relevant(renames, side_index);
if (!possible_side_renames(renames, side_index)) {
/*
* No rename detection needed for this side, but we still need
@ -2322,6 +2551,7 @@ static void detect_regular_renames(struct merge_options *opt,
return;
}
partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]);
repo_diff_setup(opt->repo, &diff_opts);
diff_opts.flags.recursive = 1;
diff_opts.flags.rename_empty = 0;
@ -2339,7 +2569,8 @@ static void detect_regular_renames(struct merge_options *opt,
diffcore_rename_extended(&diff_opts,
&renames->relevant_sources[side_index],
&renames->dirs_removed[side_index],
&renames->dir_rename_count[side_index]);
&renames->dir_rename_count[side_index],
&renames->cached_pairs[side_index]);
trace2_region_leave("diff", "diffcore_rename", opt->repo);
resolve_diffpair_statuses(&diff_queued_diff);
@ -2379,6 +2610,7 @@ static int collect_renames(struct merge_options *opt,
char *new_path; /* non-NULL only with directory renames */
if (p->status != 'A' && p->status != 'R') {
possibly_cache_new_pair(renames, p, side_index, NULL);
diff_free_filepair(p);
continue;
}
@ -2390,6 +2622,7 @@ static int collect_renames(struct merge_options *opt,
&collisions,
&clean);
possibly_cache_new_pair(renames, p, side_index, new_path);
if (p->status != 'R' && !new_path) {
diff_free_filepair(p);
continue;
@ -2445,6 +2678,8 @@ static int detect_and_process_renames(struct merge_options *opt,
trace2_region_enter("merge", "regular renames", opt->repo);
detect_regular_renames(opt, MERGE_SIDE1);
detect_regular_renames(opt, MERGE_SIDE2);
use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]);
use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]);
trace2_region_leave("merge", "regular renames", opt->repo);
trace2_region_enter("merge", "directory renames", opt->repo);
@ -3635,6 +3870,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
assert(opt->obuf.len == 0);
assert(opt->priv == NULL);
if (result->_properly_initialized != 0 &&
result->_properly_initialized != RESULT_INITIALIZED)
BUG("struct merge_result passed to merge_incore_*recursive() must be zeroed or filled with values from a previous run");
assert(!!result->priv == !!result->_properly_initialized);
if (result->priv) {
opt->priv = result->priv;
result->priv = NULL;
@ -3674,8 +3913,22 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
NULL, 1);
strmap_init_with_options(&renames->dir_renames[i],
NULL, 0);
/*
* relevant_sources uses -1 for the default, because we need
* to be able to distinguish not-in-strintmap from valid
* relevant_source values from enum file_rename_relevance.
* In particular, possibly_cache_new_pair() expects a negative
* value for not-found entries.
*/
strintmap_init_with_options(&renames->relevant_sources[i],
0, NULL, 0);
-1 /* explicitly invalid */,
NULL, 0);
strmap_init_with_options(&renames->cached_pairs[i],
NULL, 1);
strset_init_with_options(&renames->cached_irrelevant[i],
NULL, 1);
strset_init_with_options(&renames->cached_target_names[i],
NULL, 0);
}
/*
@ -3701,6 +3954,50 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
trace2_region_leave("merge", "allocate/init", opt->repo);
}
static void merge_check_renames_reusable(struct merge_options *opt,
struct merge_result *result,
struct tree *merge_base,
struct tree *side1,
struct tree *side2)
{
struct rename_info *renames;
struct tree **merge_trees;
struct merge_options_internal *opti = result->priv;
if (!opti)
return;
renames = &opti->renames;
merge_trees = renames->merge_trees;
/*
* Handle case where previous merge operation did not want cache to
* take effect, e.g. because rename/rename(1to1) makes it invalid.
*/
if (!merge_trees[0]) {
assert(!merge_trees[0] && !merge_trees[1] && !merge_trees[2]);
renames->cached_pairs_valid_side = 0; /* neither side valid */
return;
}
/*
* Handle other cases; note that merge_trees[0..2] will only
* be NULL if opti is, or if all three were manually set to
* NULL by e.g. rename/rename(1to1) handling.
*/
assert(merge_trees[0] && merge_trees[1] && merge_trees[2]);
/* Check if we meet a condition for re-using cached_pairs */
if (oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) &&
oideq(&side1->object.oid, &result->tree->object.oid))
renames->cached_pairs_valid_side = MERGE_SIDE1;
else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) &&
oideq(&side2->object.oid, &result->tree->object.oid))
renames->cached_pairs_valid_side = MERGE_SIDE2;
else
renames->cached_pairs_valid_side = 0; /* neither side valid */
}
/*** Function Grouping: merge_incore_*() and their internal variants ***/
/*
@ -3751,6 +4048,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
result->clean &= strmap_empty(&opt->priv->conflicted);
if (!opt->priv->call_depth) {
result->priv = opt->priv;
result->_properly_initialized = RESULT_INITIALIZED;
opt->priv = NULL;
}
}
@ -3848,7 +4146,16 @@ void merge_incore_nonrecursive(struct merge_options *opt,
trace2_region_enter("merge", "merge_start", opt->repo);
assert(opt->ancestor != NULL);
merge_check_renames_reusable(opt, result, merge_base, side1, side2);
merge_start(opt, result);
/*
* Record the trees used in this merge, so if there's a next merge in
* a cherry-pick or rebase sequence it might be able to take advantage
* of the cached_pairs in that next merge.
*/
opt->priv->renames.merge_trees[0] = merge_base;
opt->priv->renames.merge_trees[1] = side1;
opt->priv->renames.merge_trees[2] = side2;
trace2_region_leave("merge", "merge_start", opt->repo);
merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);

View File

@ -29,6 +29,8 @@ struct merge_result {
* !clean) and to print "CONFLICT" messages. Not for external use.
*/
void *priv;
/* Also private */
unsigned _properly_initialized;
};
/*

View File

@ -91,7 +91,6 @@ int cmd__fast_rebase(int argc, const char **argv)
struct commit *last_commit = NULL, *last_picked_commit = NULL;
struct object_id head;
struct lock_file lock = LOCK_INIT;
int clean = 1;
struct strvec rev_walk_args = STRVEC_INIT;
struct rev_info revs;
struct commit *commit;
@ -124,7 +123,8 @@ int cmd__fast_rebase(int argc, const char **argv)
assert(oideq(&onto->object.oid, &head));
hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
assert(repo_read_index(the_repository) >= 0);
if (repo_read_index(the_repository) < 0)
BUG("Could not read index");
repo_init_revisions(the_repository, &revs, NULL);
revs.verbose_header = 1;
@ -175,11 +175,10 @@ int cmd__fast_rebase(int argc, const char **argv)
free((char*)merge_opt.ancestor);
merge_opt.ancestor = NULL;
if (!result.clean)
die("Aborting: Hit a conflict and restarting is not implemented.");
break;
last_picked_commit = commit;
last_commit = create_commit(result.tree, commit, last_commit);
}
fprintf(stderr, "\nDone.\n");
/* TODO: There should be some kind of rev_info_free(&revs) call... */
memset(&revs, 0, sizeof(revs));
@ -188,24 +187,39 @@ int cmd__fast_rebase(int argc, const char **argv)
if (result.clean < 0)
exit(128);
strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
oid_to_hex(&last_picked_commit->object.oid),
oid_to_hex(&last_commit->object.oid));
if (update_ref(reflog_msg.buf, branch_name.buf,
&last_commit->object.oid,
&last_picked_commit->object.oid,
REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
error(_("could not update %s"), argv[4]);
die("Failed to update %s", argv[4]);
}
if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
die(_("unable to update HEAD"));
strbuf_release(&reflog_msg);
strbuf_release(&branch_name);
if (result.clean) {
fprintf(stderr, "\nDone.\n");
strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
oid_to_hex(&last_picked_commit->object.oid),
oid_to_hex(&last_commit->object.oid));
if (update_ref(reflog_msg.buf, branch_name.buf,
&last_commit->object.oid,
&last_picked_commit->object.oid,
REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
error(_("could not update %s"), argv[4]);
die("Failed to update %s", argv[4]);
}
if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
die(_("unable to update HEAD"));
strbuf_release(&reflog_msg);
strbuf_release(&branch_name);
prime_cache_tree(the_repository, the_repository->index, result.tree);
prime_cache_tree(the_repository, the_repository->index,
result.tree);
} else {
fprintf(stderr, "\nAborting: Hit a conflict.\n");
strbuf_addf(&reflog_msg, "rebase progress up to %s",
oid_to_hex(&last_picked_commit->object.oid));
if (update_ref(reflog_msg.buf, "HEAD",
&last_commit->object.oid,
&head,
REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
error(_("could not update %s"), argv[4]);
die("Failed to update %s", argv[4]);
}
}
if (write_locked_index(&the_index, &lock,
COMMIT_LOCK | SKIP_IF_UNCHANGED))
die(_("unable to write %s"), get_index_file());
return (clean == 0);
return (result.clean == 0);
}

View File

@ -4966,6 +4966,64 @@ test_expect_success '12g: Testcase with two kinds of "relevant" renames' '
)
'
# Testcase 12h, Testcase with two kinds of "relevant" renames
# Commit O: olddir/{a_1, b}
# Commit A: newdir/{a_2, b}
# Commit B: olddir/{alpha_1, b}
# Expected: newdir/{alpha_2, b}
test_setup_12h () {
test_create_repo 12h &&
(
cd 12h &&
mkdir olddir &&
test_seq 3 8 >olddir/a &&
>olddir/b &&
git add olddir &&
git commit -m orig &&
git branch O &&
git branch A &&
git branch B &&
git switch A &&
test_seq 3 10 >olddir/a &&
git add olddir/a &&
git mv olddir newdir &&
git commit -m A &&
git switch B &&
git mv olddir/a olddir/alpha &&
git commit -m B
)
}
test_expect_failure '12h: renaming a file within a renamed directory' '
test_setup_12h &&
(
cd 12h &&
git checkout A^0 &&
test_might_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
git ls-files >tracked &&
test_line_count = 2 tracked &&
test_path_is_missing olddir/a &&
test_path_is_file newdir/alpha &&
test_path_is_file newdir/b &&
git rev-parse >actual \
HEAD:newdir/alpha HEAD:newdir/b &&
git rev-parse >expect \
A:newdir/a O:oldir/b &&
test_cmp expect actual
)
'
###########################################################################
# SECTION 13: Checking informational and conflict messages
#

View File

@ -0,0 +1,700 @@
#!/bin/sh
test_description="remember regular & dir renames in sequence of merges"
. ./test-lib.sh
#
# NOTE 1: this testfile tends to not only rename files, but modify on both
# sides; without modifying on both sides, optimizations can kick in
# which make rename detection irrelevant or trivial. We want to make
# sure that we are triggering rename caching rather than rename
# bypassing.
#
# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either
# cherry-pick or rebase. sequencer.c is only superficially
# integrated with merge-ort; it calls merge_switch_to_result()
# after EACH merge, which updates the index and working copy AND
# throws away the cached results (because merge_switch_to_result()
# is only supposed to be called at the end of the sequence).
# Integrating them more deeply is a big task, so for now the tests
# use 'test-tool fast-rebase'.
#
#
# In the following simple testcase:
# Base: numbers_1, values_1
# Upstream: numbers_2, values_2
# Topic_1: sequence_3
# Topic_2: scruples_3
# or, in english, rename numbers -> sequence in the first commit, and rename
# values -> scruples in the second commit.
#
# This shouldn't be a challenge, it's just verifying that cached renames isn't
# preventing us from finding new renames.
#
test_expect_success 'caching renames does not preclude finding new ones' '
test_create_repo caching-renames-and-new-renames &&
(
cd caching-renames-and-new-renames &&
test_seq 2 10 >numbers &&
test_seq 2 10 >values &&
git add numbers values &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 10 >numbers &&
test_seq 1 10 >values &&
git add numbers values &&
git commit -m "Tweaked both files" &&
git switch topic &&
test_seq 2 12 >numbers &&
git add numbers &&
git mv numbers sequence &&
git commit -m A &&
test_seq 2 12 >values &&
git add values &&
git mv values scruples &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic
git ls-files >tracked-files &&
test_line_count = 2 tracked-files &&
test_seq 1 12 >expect &&
test_cmp expect sequence &&
test_cmp expect scruples
)
'
#
# In the following testcase:
# Base: numbers_1
# Upstream: rename numbers_1 -> sequence_2
# Topic_1: numbers_3
# Topic_2: numbers_1
# or, in english, the first commit on the topic branch modifies numbers by
# shrinking it (dramatically) and the second commit on topic reverts its
# parent.
#
# Can git apply both patches?
#
# Traditional cherry-pick/rebase will fail to apply the second commit, the
# one that reverted its parent, because despite detecting the rename from
# 'numbers' to 'sequence' for the first commit, it fails to detect that
# rename when picking the second commit. That's "reasonable" given the
# dramatic change in size of the file, but remembering the rename and
# reusing it is reasonable too.
#
# We do test here that we expect rename detection to only be run once total
# (the topic side of history doesn't need renames, and with caching we
# should be able to only run rename detection on the upstream side one
# time.)
test_expect_success 'cherry-pick both a commit and its immediate revert' '
test_create_repo pick-commit-and-its-immediate-revert &&
(
cd pick-commit-and-its-immediate-revert &&
test_seq 11 30 >numbers &&
git add numbers &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 30 >numbers &&
git add numbers &&
git mv numbers sequence &&
git commit -m "Renamed (and modified) numbers -> sequence" &&
git switch topic &&
test_seq 11 13 >numbers &&
git add numbers &&
git commit -m A &&
git revert HEAD &&
#
# Actual testing
#
git switch upstream &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 1 calls
)
'
#
# In the following testcase:
# Base: sequence_1
# Upstream: rename sequence_1 -> values_2
# Topic_1: rename sequence_1 -> values_3
# Topic_2: add unrelated sequence_4
# or, in english, both sides rename sequence -> values, and then the second
# commit on the topic branch adds an unrelated file called sequence.
#
# This testcase presents no problems for git traditionally, but having both
# sides do the same rename in effect "uses it up" and if it remains cached,
# could cause a spurious rename/add conflict.
#
test_expect_success 'rename same file identically, then reintroduce it' '
test_create_repo rename-rename-1to1-then-add-old-filename &&
(
cd rename-rename-1to1-then-add-old-filename &&
test_seq 3 8 >sequence &&
git add sequence &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >sequence &&
git add sequence &&
git mv sequence values &&
git commit -m "Renamed (and modified) sequence -> values" &&
git switch topic &&
test_seq 3 10 >sequence &&
git add sequence &&
git mv sequence values &&
git commit -m A &&
test_write_lines A B C D E F G H I J >sequence &&
git add sequence &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
git ls-files >tracked &&
test_line_count = 2 tracked &&
test_path_is_file values &&
test_path_is_file sequence &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 2 calls
)
'
#
# In the following testcase:
# Base: olddir/{valuesZ_1, valuesY_1, valuesX_1}
# Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2
# rename olddir/valuesY_1 -> dirA/valuesY_2
# rename olddir/valuesX_1 -> dirB/valuesX_2
# Topic_1: rename olddir/valuesZ_1 -> dirA/valuesZ_3
# rename olddir/valuesY_1 -> dirA/valuesY_3
# Topic_2: add olddir/newfile
# Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX
# Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile}
#
# This testcase presents no problems for git traditionally, but having both
# sides do the same renames in effect "use it up" but if the renames remain
# cached, the directory rename could put newfile in the wrong directory.
#
test_expect_success 'rename same file identically, then add file to old dir' '
test_create_repo rename-rename-1to1-then-add-file-to-old-dir &&
(
cd rename-rename-1to1-then-add-file-to-old-dir &&
mkdir olddir/ &&
test_seq 3 8 >olddir/valuesZ &&
test_seq 3 8 >olddir/valuesY &&
test_seq 3 8 >olddir/valuesX &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >olddir/valuesZ &&
test_seq 1 8 >olddir/valuesY &&
test_seq 1 8 >olddir/valuesX &&
git add olddir &&
mkdir dirA &&
git mv olddir/valuesZ olddir/valuesY dirA &&
git mv olddir/ dirB/ &&
git commit -m "Renamed (and modified) values*" &&
git switch topic &&
test_seq 3 10 >olddir/valuesZ &&
test_seq 3 10 >olddir/valuesY &&
git add olddir &&
mkdir dirA &&
git mv olddir/valuesZ olddir/valuesY dirA &&
git commit -m A &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
git ls-files >tracked &&
test_line_count = 4 tracked &&
test_path_is_file dirA/valuesZ &&
test_path_is_file dirA/valuesY &&
test_path_is_file dirB/valuesX &&
test_path_is_file dirB/newfile &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 3 calls
)
'
#
# In the following testcase, upstream renames a directory, and the topic branch
# first adds a file to the directory, then later renames the directory
# differently:
# Base: olddir/a
# olddir/b
# Upstream: rename olddir/ -> newdir/
# Topic_1: add olddir/newfile
# Topic_2: rename olddir/ -> otherdir/
#
# Here we are just concerned that cached renames might prevent us from seeing
# the rename conflict, and we want to ensure that we do get a conflict.
#
# While at it, though, we do test that we only try to detect renames 2
# times and not three. (The first merge needs to detect renames on the
# upstream side. Traditionally, the second merge would need to detect
# renames on both sides of history, but our caching of upstream renames
# should avoid the need to re-detect upstream renames.)
#
test_expect_success 'cached dir rename does not prevent noticing later conflict' '
test_create_repo dir-rename-cache-not-occluding-later-conflict &&
(
cd dir-rename-cache-not-occluding-later-conflict &&
mkdir olddir &&
test_seq 3 10 >olddir/a &&
test_seq 3 10 >olddir/b &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 3 10 >olddir/a &&
test_seq 3 10 >olddir/b &&
git add olddir &&
git mv olddir newdir &&
git commit -m "Dir renamed" &&
git switch topic &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m A &&
test_seq 1 8 >olddir/a &&
test_seq 1 8 >olddir/b &&
git add olddir &&
git mv olddir otherdir &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
#git cherry-pick upstream..topic &&
grep CONFLICT..rename/rename output &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 2 calls
)
'
# Helper for the next two tests
test_setup_upstream_rename () {
test_create_repo $1 &&
(
cd $1 &&
test_seq 3 8 >somefile &&
test_seq 3 8 >relevant-rename &&
git add somefile relevant-rename &&
mkdir olddir &&
test_write_lines a b c d e f g >olddir/a &&
test_write_lines z y x w v u t >olddir/b &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >somefile &&
test_seq 1 8 >relevant-rename &&
git add somefile relevant-rename &&
git mv relevant-rename renamed &&
echo h >>olddir/a &&
echo s >>olddir/b &&
git add olddir &&
git mv olddir newdir &&
git commit -m "Dir renamed"
)
}
#
# In the following testcase, upstream renames a file in the toplevel directory
# as well as its only directory:
# Base: relevant-rename_1
# somefile
# olddir/a
# olddir/b
# Upstream: rename relevant-rename_1 -> renamed_2
# rename olddir/ -> newdir/
# Topic_1: relevant-rename_3
# Topic_2: olddir/newfile_1
# Topic_3: olddir/newfile_2
#
# In this testcase, since the first commit being picked only modifies a
# file in the toplevel directory, the directory rename is irrelevant for
# that first merge. However, we need to notice the directory rename for
# the merge that picks the second commit, and we don't want the third
# commit to mess up its location either. We want to make sure that
# olddir/newfile doesn't exist in the result and that newdir/newfile does.
#
# We also test that we only do rename detection twice. We never need
# rename detection on the topic side of history, but we do need it twice on
# the upstream side of history. For the first topic commit, we only need
# the
# relevant-rename -> renamed
# rename, because olddir is unmodified by Topic_1. For Topic_2, however,
# the new file being added to olddir means files that were previously
# irrelevant for rename detection are now relevant, forcing us to repeat
# rename detection for the paths we don't already have cached. Topic_3 also
# tweaks olddir/newfile, but the renames in olddir/ will have been cached
# from the second rename detection run.
#
test_expect_success 'dir rename unneeded, then add new file to old dir' '
test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
(
cd dir-rename-unneeded-until-new-file &&
git switch topic &&
test_seq 3 10 >relevant-rename &&
git add relevant-rename &&
git commit -m A &&
echo foo >olddir/newfile &&
git add olddir/newfile &&
git commit -m B &&
echo bar >>olddir/newfile &&
git add olddir/newfile &&
git commit -m C &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 5 tracked &&
test_path_is_missing olddir/newfile &&
test_path_is_file newdir/newfile
)
'
#
# The following testcase is *very* similar to the last one, but instead of
# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
# Base: relevant-rename_1
# somefile_1
# olddir/a
# olddir/b
# Upstream: rename relevant-rename_1 -> renamed_2
# rename olddir/ -> newdir/
# Topic_1: relevant-rename_3
# Topic_2: rename somefile -> olddir/newfile_2
# Topic_3: modify olddir/newfile_3
#
# In this testcase, since the first commit being picked only modifies a
# file in the toplevel directory, the directory rename is irrelevant for
# that first merge. However, we need to notice the directory rename for
# the merge that picks the second commit, and we don't want the third
# commit to mess up its location either. We want to make sure that
# neither somefile or olddir/newfile exists in the result and that
# newdir/newfile does.
#
# This testcase needs one more call to rename detection than the last
# testcase, because of the somefile -> olddir/newfile rename in Topic_2.
test_expect_success 'dir rename unneeded, then rename existing file into old dir' '
test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside &&
(
cd dir-rename-unneeded-until-file-moved-inside &&
git switch topic &&
test_seq 3 10 >relevant-rename &&
git add relevant-rename &&
git commit -m A &&
test_seq 1 10 >somefile &&
git add somefile &&
git mv somefile olddir/newfile &&
git commit -m B &&
test_seq 1 12 >olddir/newfile &&
git add olddir/newfile &&
git commit -m C &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 3 calls &&
test_path_is_missing somefile &&
test_path_is_missing olddir/newfile &&
test_path_is_file newdir/newfile &&
git ls-files >tracked &&
test_line_count = 4 tracked
)
'
# Helper for the next two tests
test_setup_topic_rename () {
test_create_repo $1 &&
(
cd $1 &&
test_seq 3 8 >somefile &&
mkdir olddir &&
test_seq 3 8 >olddir/a &&
echo b >olddir/b &&
git add olddir somefile &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch topic &&
test_seq 1 8 >somefile &&
test_seq 1 8 >olddir/a &&
git add somefile olddir/a &&
git mv olddir newdir &&
git commit -m "Dir renamed" &&
test_seq 1 10 >somefile &&
git add somefile &&
mkdir olddir &&
>olddir/unrelated-file &&
git add olddir &&
git commit -m "Unrelated file in recreated old dir"
)
}
#
# In the following testcase, the first commit on the topic branch renames
# a directory, while the second recreates the old directory and places a
# file into it:
# Base: somefile
# olddir/a
# olddir/b
# Upstream: olddir/newfile
# Topic_1: somefile_2
# rename olddir/ -> newdir/
# Topic_2: olddir/unrelated-file
#
# Note that the first pick should merge:
# Base: somefile
# olddir/{a,b}
# Upstream: olddir/newfile
# Topic_1: rename olddir/ -> newdir/
# For which the expected result (assuming merge.directoryRenames=true) is
# clearly:
# Result: somefile
# newdir/{a, b, newfile}
#
# While the second pick does the following three-way merge:
# Base (Topic_1): somefile
# newdir/{a,b}
# Upstream (Result from 1): same files as base, but adds newdir/newfile
# Topic_2: same files as base, but adds olddir/unrelated-file
#
# The second merge is pretty trivial; upstream adds newdir/newfile, and
# topic_2 adds olddir/unrelated-file. We're just testing that we don't
# accidentally cache directory renames somehow and rename
# olddir/unrelated-file to newdir/unrelated-file.
#
# This testcase should only need one call to diffcore_rename_extended().
test_expect_success 'caching renames only on upstream side, part 1' '
test_setup_topic_rename cache-renames-only-upstream-add-file &&
(
cd cache-renames-only-upstream-add-file &&
git switch upstream &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m "Add newfile" &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 1 calls &&
git ls-files >tracked &&
test_line_count = 5 tracked &&
test_path_is_missing newdir/unrelated-file &&
test_path_is_file olddir/unrelated-file &&
test_path_is_file newdir/newfile &&
test_path_is_file newdir/b &&
test_path_is_file newdir/a &&
test_path_is_file somefile
)
'
#
# The following testcase is *very* similar to the last one, but instead of
# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
# Base: somefile
# olddir/a
# olddir/b
# Upstream: somefile_1 -> olddir/newfile
# Topic_1: rename olddir/ -> newdir/
# somefile_2
# Topic_2: olddir/unrelated-file
# somefile_3
#
# Much like the previous test, this case is actually trivial and we are just
# making sure there isn't some spurious directory rename caching going on
# for the wrong side of history.
#
#
# This testcase should only need two calls to diffcore_rename_extended(),
# both for the first merge, one for each side of history.
#
test_expect_success 'caching renames only on upstream side, part 2' '
test_setup_topic_rename cache-renames-only-upstream-rename-file &&
(
cd cache-renames-only-upstream-rename-file &&
git switch upstream &&
git mv somefile olddir/newfile &&
git commit -m "Add newfile" &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 4 tracked &&
test_path_is_missing newdir/unrelated-file &&
test_path_is_file olddir/unrelated-file &&
test_path_is_file newdir/newfile &&
test_path_is_file newdir/b &&
test_path_is_file newdir/a
)
'
test_done