| t@@ -1,109 +0,0 @@
-Ok... so the debian repo is essentially a directory heirarchy...
-
-Ok.. Do you understand the repo heirarchy? ie the main folder (in
-amprolla case /merged) with sub folders 'dist' (for repo metadata) and
-'pool' (where the actual binary and source packages go)??
-forget about the "pool" folder, amprolla doesn't touch it...
-
-in "dists/" you have all the suites ie: jessie, ascii, ceres and all
-the and stable, unstable and version symlinks.
-
-in the suite folder, you find the section folders: main contrib non-free
-and files InRelease, Release and Release.gpg
-
-InRelease is just the pgp/smime version of the Release file - the gpg
-sig is the same as Release.gpg
-
-Anyway the Release file basically is a dictionary of most of the files
-in the subdirectory with size and checksums (SHA256, SHA512 etc) in what
-is essentially RFC822 format, with a bunch of headers at the top that
-specify details about the Release of that suite.
-
-In the suite subdirectories you have a bunch of folders, binary-
-which contains the Packages file, and compressed copies of that, and a
-Release Stanza, and similar for the source folder with Sources file and
-compressed copies etc.
-
-the Contents files (currently not processed) are their too.
-(They contain a list of all the files in each package)
-
-their is also the i8n - folder which contains the processed files.
-oops s/processed files/translation files/
-
-
-Amprolla takes several mirrors and merges them in order of priority
-starting with the highest priority. It firsts iterates over the structure
-to create it's repo structure, ie dists/// etc and then first
-copies the highest priority mirror Packages and Sources files in and then for
-the othermirrors iterates over the Packages and Sources files and compares
-each package stanza for a match, and if there is a match on name then the highest
-priority mirror version is kept, if not then the package is added in.
-(This is where the inefficient model really shows up)
-
-
-After all the new Source and Packages files are processed then the Release and
-InRelease files are generated by walking the hierarchy and adding those files in.
-
-There is a lot of complexities, part of which is in the design of amprolla.
-What I had started to do, and in describing it now, it seems obvious to me
-I should probably have started pretty much from scratch is instead of this
-iterative approach of compare and add or skip is keep a cache of each mirrors
-last state, and then on each run create a delta between the last state and
-current state.
-
-
-* and how does dak integrate in all of this?
-it doesn't. Dak is a standalone repository which just deals with the packages built by our CI
-* so it's the same as any debian repo
-Yup, slightly modified to handle our CI and some other tweaks
-and I checked and our version is in gdo too.
-
-
-anyway as I was saying about my approach re delta's:
-There are big efficiencies in this approach. For starters, we only download the InRelease or
-Release and Release.gpg file and after verifying it, compare to the previous state, and we
-can use the delta generated to pick what files are new, changed or removed from the repo.
-This means we only download the changed files in the repo for a start. And for the
-Packages and Sources files we create a delta list of changed stanza's to apply.
-
-Instead of building the entire repo from scratch, we apply the delta
-to a copy of our merged repo with handling for priority etc...
-
-What stumped me in the end is we actually should verify that we only have packages go in that
-have a matching source stanza and we really need to process the contents and translations
-at the same time.
-
-I suspect that nextime realised this which is why he started on amprolla2 which essentially
-replicates dak + amprolla function...
-
-I just realised, I forgot to mention the overrides processing in amprolla. In the very
-top of the dir in "merged/" is the "indices" folder that contains overrides. These
-files specify for each Packages files, any metadata changes that need to be applied to
-package stanza's
-
-In debian their is a entry for every single deb package/source in the archive making
-them very large. We did away with that to reduce the overhead of processing it created.
-
-So we only have entries for those that need changing, usually to change priorities of
-systemd packages and remove recommends and suggests for systemd related packages.
-
-* are indices a part of the repo or only needed by amprolla?
-both. In debian, dak generates them and they are hand modified by the repo masters to
-apply needed fixes. With amprolla, we only create them for applying our own changes as needed.
-Technically they don't need to be in the repo, as they're not used by apt, but practically
-it's good to have them there.
-
-hmmm, I think I've cracked my problem...
-If I use the Sources delta to identify changed packages, I can use that to pick and apply
-the changed Packages stanza's Contents and Translations. This would save lot's of
-iterations, and I only need the delta Processing to be done on the Sources files.
-Wow that would really speed things up
-
-The other benefit, is we can side load packages this way too and use it to replace dak
-as well as either a standalone repo or directly into the merged repo.
-And all without a hefty database. or the writeup
-
-your welcome. It has helped me probably as much as you. I think it's
-turning into a full rewrite, but seems better design and possibly far easier to
-write from scratch.
-Anyway, it's nearly 3:30am here, so better get a couple hours sleep! |
| t@@ -2,7 +2,7 @@
# see LICENSE file for copyright and license details
"""
-Module used to orchestrace the entire amprolla merge
+Module used to orchestrate the entire amprolla merge
"""
from os.path import join
t@@ -12,8 +12,6 @@ from lib.config import (arches, categories, suites, mergedir, mergesubdir,
pkgfiles, srcfiles, spooldir, repos)
from lib.release import write_release
-# from pprint import pprint
-
def do_merge():
"""
t@@ -33,7 +31,7 @@ def do_merge():
am = __import__('amprolla_merge')
- p = Pool(4)
+ p = Pool(4) # Set it to the number of CPUs you want to use
p.map(am.main, pkg)
|