The only sensitive way to clean up old pages (given the current
architecture) is currently to schedule a bucket policy. The bucket
policy evaluation happens at a low level API and will delete even huge
amounts of data with little impact on performance.
The program will find the first 1000 obsolete site versions and schedule
a policy with rules to remove those. This is far from perfect, and but a
decent solution for now, and the learning from trying to implement the
cleanup in various ways will be reflected in future updates to the pages
& storage architecture.
Deleting individual objects is painfully slow. Instead, use a lifecycle
policy rule to expire a given prefix. The actual deleting will then be
done by internal mechanisms, which are vastly more efficient.
The following changes apply:
- Media (img, audio, video) can be embedded from any HTTPS origin
- iframes can be embedded from any https origin
- Scripts & workers are still limited to same origin
This allows to use recursive patterns like "**/*.txt". Go's path.Match
only allows explicit nesting, so to accomplish the same thing you'd need
to duplicate the same pattern multiple times with different amount of
asterisks ("*.txt", "*/*.txt", "*/*/*.txt"), depending on how deep your
site's directory structure is.
Currently, files are first uploaded to a temporary location as the hash
of all content is required to compute their final location.
Unfortunately, the move that occurs afterwards has copy & delete
semantics, so it can take a lot of requests and time for sites with many
files, even causing timeouts.
With this commit, the final location is simply determined by generating
a new UUID, so files can be uploaded right to it. The only downside is
that an update will happen even if you upload the same site twice, but
given that this really has no user-facing effect and the work done
before this gets handled is already significant anyways, the benefits
outweigh it by a margin.
Cuts publish time for sites with many files to roughly a third.
The `Host` part of any URI can contain a port, which does not go down
well with the resolver. Use `Hostname()` instead.
See https://pkg.go.dev/net/url#URL.Hostname
Refuse to publish a site for a custom domain if the DNS records
aren't set up properly. This should make it easier for users to
understand why a site doesn't work.
Something went wrong when applying f209b016 and it is missing most of
the changes that were intended. This commit is a fixup adding all those
missing changes. See the commit message of f209b016 for details.