You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project is considered feature complete for the primary maintainer. If you would like a bugfix or enhancement and can not sponsor the work, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if desired.
8
-
9
-
backup a github user or organization
7
+
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues).
10
8
11
9
Requirements
12
10
============
13
11
14
12
- GIT 1.9+
13
+
- Python
15
14
16
15
Installation
17
16
============
@@ -20,14 +19,22 @@ Using PIP via PyPI::
20
19
21
20
pip install github-backup
22
21
23
-
Using PIP via Github::
22
+
Using PIP via Github (more likely the latest version)::
Python scripts are unlikely to be included in your ``$PATH`` by default, this means it cannot be run directly in terminal with ``$ github-backup ...``, you can either add python's install path to your environments ``$PATH`` or call the script directly e.g. using ``$ ~/.local/bin/github-backup``.*
The package can be used to backup an *entire* organization or repository, including issues and wikis in the most appropriate format (clones for wikis, json files for issues).
140
+
Usage Details
141
+
=============
134
142
135
143
Authentication
136
-
==============
144
+
--------------
145
+
146
+
**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated <https://github.blog/2023-03-09-raising-the-bar-for-software-security-github-2fa-begins-march-13/>`_ by 2023 EOY.
147
+
148
+
``--username`` is used for basic password authentication and separate from the positional argument ``USER``, which specifies the user account you wish to back up.
149
+
150
+
**Classic tokens** are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
151
+
152
+
If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``.
153
+
154
+
155
+
Fine Tokens
156
+
~~~~~~~~~~~
157
+
158
+
You can "generate new token", choosing the repository scope by selecting specific repos or all repos. On Github this is under *Settings -> Developer Settings -> Personal access tokens -> Fine-grained Tokens*
159
+
160
+
Customise the permissions for your use case, but for a personal account full backup you'll need to enable the following permissions:
161
+
162
+
**User permissions**: Read access to followers, starring, and watching.
163
+
164
+
**Repository permissions**: Read access to code, commit statuses, issues, metadata, pages, pull requests, and repository hooks.
165
+
166
+
167
+
Prefer SSH
168
+
~~~~~~~~~~
169
+
170
+
If cloning repos is enabled with ``--repositories``, ``--all-starred``, ``--wikis``, ``--gists``, ``--starred-gists`` using the ``--prefer-ssh`` argument will use ssh for cloning the git repos, but all other connections will still use their own protocol, e.g. API requests for issues uses HTTPS.
171
+
172
+
To clone with SSH, you'll need SSH authentication setup `as usual with Github <https://docs.github.com/en/authentication/connecting-to-github-with-ssh>`_, e.g. via SSH public and private keys.
137
173
138
-
Note: Password-based authentication will fail if you have two-factor authentication enabled.
139
174
140
175
Using the Keychain on Mac OSX
141
-
=============================
176
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
142
177
Note: On Mac OSX the token can be stored securely in the user's keychain. To do this:
143
178
144
179
1. Open Keychain from "Applications -> Utilities -> Keychain Access"
@@ -152,31 +187,137 @@ Note: When you run github-backup, you will be asked whether you want to allow "
152
187
1. **Allow:** In this case you will need to click "Allow" each time you run `github-backup`
153
188
2. **Always Allow:** In this case, you will not be asked for permission when you run `github-backup` in future. This is less secure, but is required if you want to schedule `github-backup` to run automatically
154
189
190
+
191
+
Github Rate-limit and Throttling
192
+
--------------------------------
193
+
194
+
"github-backup" will automatically throttle itself based on feedback from the Github API.
195
+
196
+
Their API is usually rate-limited to 5000 calls per hour. The API will ask github-backup to pause until a specific time when the limit is reset again (at the start of the next hour). This continues until the backup is complete.
197
+
198
+
During a large backup, such as ``--all-starred``, and on a fast connection this can result in (~20 min) pauses with bursts of API calls periodically maxing out the API limit. If this is not suitable `it has been observed <https://github.com/josegonzalez/python-github-backup/issues/76#issuecomment-636158717>`_ under real-world conditions that overriding the throttle with ``--throttle-limit 5000 --throttle-pause 0.6`` provides a smooth rate across the hour, although a ``--throttle-pause 0.72`` (3600 seconds [1 hour] / 5000 limit) is theoretically safer to prevent large rate-limit pauses.
199
+
200
+
155
201
About Git LFS
156
-
=============
202
+
-------------
157
203
158
-
When you use the "--lfs" option, you will need to make sure you have Git LFS installed.
204
+
When you use the ``--lfs`` option, you will need to make sure you have Git LFS installed.
159
205
160
206
Instructions on how to do this can be found on https://git-lfs.github.com.
161
207
162
-
Examples
208
+
209
+
Gotchas / Known-issues
210
+
======================
211
+
212
+
All is not everything
213
+
---------------------
214
+
215
+
The ``--all`` argument does not include; cloning private repos (``-P, --private``), cloning forks (``-F, --fork``) cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--starred-gists``) or cloning starred gist repos (``--starred-gists``). See examples for more.
216
+
217
+
Cloning all starred size
218
+
------------------------
219
+
220
+
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with ``--starred``.
221
+
222
+
Incremental Backup
223
+
-------------------
224
+
225
+
Using (``-i, --incremental``) will only request new data from the API **since the last run (successful or not)**. e.g. only request issues from the API since the last run.
226
+
227
+
This means any blocking errors on previous runs can cause a large amount of missing data in backups.
228
+
229
+
Known blocking errors
230
+
---------------------
231
+
232
+
Some errors will block the backup run by exiting the script. e.g. receiving a 403 Forbidden error from the Github API.
233
+
234
+
If the incremental argument is used, this will result in the next backup only requesting API data since the last blocked/failed run. Potentially causing unexpected large amounts of missing data.
235
+
236
+
It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs.
237
+
238
+
1. **Starred public repo hooks blocking**
239
+
240
+
Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing.
241
+
242
+
This is due to needing the correct permission for ``--hooks`` on public repos.
243
+
244
+
2. **Releases blocking**
245
+
246
+
A known ``--releases`` (required for ``--assets``) error will sometimes block the backup.
247
+
248
+
If you're backing up a lot of repositories with releases e.g. an organisation or ``--all-starred``. You may need to remove ``--releases`` (and therefore ``--assets``) to complete a backup. Documented in `issue 209 <https://github.com/josegonzalez/python-github-backup/issues/209>`_.
249
+
250
+
251
+
"bare" is actually "mirror"
252
+
--------------------------
253
+
254
+
Using the bare clone argument (``--bare``) will actually call git's ``clone --mirror`` command. There's a subtle difference between `bare <https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---bare>`_ and `mirror <https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---mirror>`_ clone.
255
+
256
+
*From git docs "Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository."*
257
+
258
+
259
+
Starred gists vs starred repo behaviour
260
+
---------------------------------------
261
+
262
+
The starred normal repo cloning (``--all-starred``) argument stores starred repos separately to the users own repositories. However, using ``--starred-gists`` will store starred gists within the same directory as the users own gists ``--gists``. Also, all gist repo directory names are IDs not the gist's name.
The ``--skip-existing`` argument will skip a backup if the directory already exists, even if the backup in that directory failed (perhaps due to a blocking error). This may result in unexpected missing data in a regular backup.
269
+
270
+
271
+
Github Backup Examples
163
272
========
164
273
165
-
Backup all repositories, including private ones::
274
+
Backup all repositories, including private ones using a classic token::
Quietly and incrementally backup useful Github user data (public and private repos with SSH) including; all issues, pulls, all public starred repos and gists (omitting "hooks", "releases" and therefore "assets" to prevent blocking). *Great for a cron job.* ::
This project is considered feature complete for the primary maintainer @josegonzalez. If you would like a bugfix or enhancement, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if you'd like to sponsor the work instead.
0 commit comments