Wednesday, May 16, 2012

Repositories Spring Cleaning!

Over the next few days, we will gradually remove these duplicated 3rd party repositories from Studio. Official and private user repositories will not be affected. Affected appliances will be automatically updated.

What impact might this have on you?

Most users will not notice anything different, but some may encounter a couple of side-effects:

  1. Your 3rd party repositories in Studio may now have a different name.The name of the repository being used by your appliance may change, but you don't have to worry about that because:
    • Your account was not compromised - the change is done by our cleanup script.
    • The contents of the affected repositories (if any) should be identical.
    • The official and private user repositories are not affected.
  2. Your appliance may have software resolution errors. This is rare, but can happen if the explicitly requested software version is not available in the new repository (eg. the old version is no longer in the repository nor in the Studio cache). If this happens, Studio will propose the following solutions:
    • Add the latest version of the package: This will explicitly require the latest version of the package from the repositories in your appliance.
    • Do not require a specific version of the package: This removes the explicit version constrain, pulling in latest version instead.
    • Remove the package: No longer install the package in the appliance.
If you must have the old package, you can either package it inside of a dedicated repository with the openSUSE Build Service or upload the RPM to Studio.

Please contact us via the forum or mailing list if you have any questions or problems.

Why are we doing this?

It’s spring time once again and so we’re busy with housekeeping to maintain a reasonably fast and responsive site, even as the number of users grows. This week’s spring cleaning target is the software repositories in Studio. There are three types of software repositories that can be added to your SUSE Studio appliances:

  • Official repositories: Repositories added by the Studio administrators, like openSUSE 12.1 OSS and SLES 11 SP2 x86_64.
  • 3rd party repositories: Public repositories hosted outside of that have been added by Studio users, such as those from the openSUSE Build Service and PackMan.
  • Private user repositories: Repositories that are automatically created and hosted by Studio whenever you upload a RPM to your appliance in the software tab. These are private to your appliance and are only accessible by Studio.

For faster appliance builds and improved reliability (eg. builds will still work if the external repository is temporarily down), all RPMs from these repositories are cached by Studio. Whenever a new repository is added, all the RPMs within it are added to the download queue and bumped up if it is required by an appliance build (the build process waits for the download to be completed).

With more than 18,000 repositories, these cached RPMs use quite some terabytes on our storage servers. There are often duplicated RPMs from different repositories, so we use file deduplication to reduce the overall storage footprint.

Duplicate repositories were initially avoided by checking the repository URL, but this is of course insufficient as it does not handle repository mirrors. Thus it now checks the repository ID, so we can detect these mirrors and remove them. This does not reduce the storage footprint much since the RPMs are already deduplicated at the file level, but it does save on the repository metadata resulting in faster and cleaner repository searches.

What is coming next?

We are close to completing the first phase of revamping our backend repository and package handling service, allowing us to overcome the limitations of the current system. Stay tuned, we will blog about this in the near future.
© 2013 SUSE