Site admins can sync Git repositories hosted on GitHub.com and GitHub Enterprise with Sourcegraph so that users can search and navigate the repositories.
There are 2 ways to connect with GitHub:
- GitHub.com
- GitHub Enterprise v2.10 and newer
Sourcegraph 5.1+
To create a GitHub App and connect it to Sourcegraph:
- Go to Site admin > Repositories > Github Apps on Sourcegraph.
-
Click Create GitHub App.
-
Enter a name for your app (it must be unique across your GitHub instance) and the URL of your GitHub instance.
You may optionally specify an organization to register the app with. If no organization is specified, the app will be owned by the account of the user who creates it on GitHub. This is the default.
You may also optionally set the App visibility to public. A GitHub App must be made public if you wish to install it on multiple organizations or user accounts. The default is private.
- When you click Create GitHub App, you will be redirected to GitHub to confirm the details of the App to be created.
- To complete the setup on GitHub, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. The default is All repositories. Any repositories that you choose to omit will not be able to be synced by Sourcegraph. You can change this later.
- Click Install. Once complete, you will be redirected back to Sourcegraph, where you will now be able to view and manage the details of your new GitHub App from within Sourcegraph.
-
Sourcegraph needs to map Sourcegraph users to GitHub users. Click Reveal secret to get the JSON configuration for the auth provider and copy/paste it into the
"auth.providers"
section of your site configuration. -
Click Add connection under your new installation to create a code host connection to GitHub with this App installation. By default, it will sync all repositories the App can access within the namespace where it was installed. Repository permission enforcement will also be turned on by default.
You can now select repositories to sync or see more configuration options in the configuration section.
-
(Optional) If you want to sync repositories from other organization or user namespaces and your GitHub App is set to public visibility, you can create additional installations with Add installation.
NOTE: If you are using Batch Changes, you can create a GitHub App to perform commit signing Beta.
The initial GitHub App setup will only install the App on the organization or user account that you registered it with. If your code is spread across multiple organizations or user accounts, you will need to create additional installations for each namespace that you want Sourcegraph to sync repositories from.
By default, Sourcegraph creates a private GitHub App, which only allows the App to be installed on the same organization or user account that it was created in. If you did not set the App to public visibility during creation, you will need to change the visibility to public before you can install it in other namespaces. For security considerations, see GitHub's documentation on private vs public apps.
Once public, App can be installed in additional namespaces either from Sourcegraph or from GitHub.
- Go to Site admin > Repositories > Github Apps and click Edit on the App you want to install in another namespace. You'll be taken to the App details page.
-
Click Add installation. You will be redirected to GitHub to pick which other organization to install the App on and finish the installation process.
NOTE: Only organization owners can install GitHub Apps on an organization. If you are not an owner, you will need to ask an owner to install the App for you.
- As before, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. Once you click Install and the setup completes, you will be redirected back to Sourcegraph, where you will now see your additional installation listed.
- To sync repositories from this installation, click Add connection under your new installation.
- Go to the GitHub App page. You can get here easily from Sourcegraph by clicking View in GitHub for the App you want to install in another namespace.
- Click Configure, or go to App settings > Install App, and select the organization or user account you want to install the App on.
- As before, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. Once you click Install and the setup completes, you will be redirected back to Sourcegraph.
- GitHub App installations will be automatically synced in the background. Return to Site admin > Repositories > Github Apps and click Edit on the App you added the new installation for. You'll be taken to the App details page. Once synced, you will see the new installation listed.
- To sync repositories from this installation, click Add connection under your new installation.
You can uninstall a GitHub App from a namespace or remove it altogether at any time.
To remove an installation in a single namespace, click View in GitHub for the installation you want to remove. If you are able to administer Apps in this namespace, you will see Uninstall "[APP NAME]" in the "Danger zone" at the bottom of the page. Click Uninstall to remove the App from this namespace. Sourcegraph will periodically sync installations in the background. It may temporarily throw errors related to the missing installation until the sync completes. You can check the GitHub App details page to confirm the installation has been removed.
To remove an App entirely, go to Site admin > Repositories > Github Apps and click Remove for the App you want to remove. You will be prompted to confirm you want to remove the App from Sourcegraph. Once removed from the Sourcegraph side, Sourcegraph will no longer communicate with your GitHub instance via the App unless explicitly reconnected. However, the App will still exist on GitHub unless manually deleted there, as well.
Sourcegraph uses the tokens from GitHub Apps in the following ways:
Installation access tokens are short-lived, non-refreshable tokens that give Sourcegraph access to the repositories the GitHub App has been given access to. Sourcegraph uses these tokens to clone repositories and to determine which users should be able to view a repository. These tokens expire after 1 hour.
These are OAuth tokens that Sourcegraph receives when a user signs into Sourcegraph using the configured GitHub App. Sourcegraph uses these tokens to link the user's Sourcegraph account to their GitHub account, as well as determine which repositories a user should be able to access. These tokens are refreshable, and by default they expire after 8 hours. Sourcegraph refreshes the user tokens as required.
Sourcegraph 5.1.5+
If you are using a self-signed certificate for your GitHub Enterprise instance, configure tls.external
under experimentalFeatures
in the Site configuration with your certificate(s).
{
"experimentalFeatures": {
"tls.external": {
"certificates": [
"-----BEGIN CERTIFICATE-----\n..."
]
}
}
}
To connect GitHub to Sourcegraph with an access token:
- Go to Site admin > Manage code hosts
- Select GitHub.
- Configure the connection to GitHub using the action buttons above the text field, and additional fields can be added using Cmd/Ctrl+Space for auto-completion. See the configuration documentation below.
- Press Add repositories.
In this example, the kubernetes public repository on GitHub is added by selecting Add a single repository and replacing <owner>/<repository>
with kubernetes/kubernetes
:
{
"url": "https://github.com",
"token": "<access token>",
"orgs": [],
"repos": [
"kubernetes/kubernetes"
]
}
GitHub requires a token
in order to access their API. There are different types of tokens that can be supplied. When using GitHub apps, this is handled automatically by Sourcegraph.
- GitHub app installation access token:
An installation access token is created automatically when you install a GitHub app. Do not set this token in the code host connection configuration. This token gives Sourcegraph the same level of access to repositories as the GitHub app installation. - Personal access token:
This gives Sourcegraph the same level of access to repositories as the account that created the token. If you don't want to mix your personal repositories with your organizations repositories, you could add an entry to theexclude
array, or you can use a machine user token or a fine-grained access token. - Fine-grained access token:
Allows scoping access tokens to specific repositories with specific permissions. Consult the table below for the required permissions. - Machine user token:
Generates a token for a machine user that is affiliated with an organization instead of a user account.
No token scopes are required if you only want to sync public repositories and don't want to use any of the following features. Otherwise, the following token scopes are required for specific features:
Feature | Required token scopes |
---|---|
Sync private repositories | repo |
Sync repository permissions | repo |
Batch changes | repo , read:org , user:email , read:discussion , and workflow (learn more) |
WARNING: In addition to the prerequisite token scopes, the account attached to the token must actually have the same level of access to the relevant resources that you are trying to grant. For example:
- If read access to repositories is required, the token must have
repo
scope and the token's account must have read access to the relevant repositories. This can happen by being directly granted read access to repositories, being on a team with read access to the repository, and so on.- If write access to repositories is required, the token must have
repo
scope and the token's account must have write access to all repositories. This can happen by being added as a direct contributor, being on a team with write access to the repository, being an admin for the repository's organization, and so on.- If write access to organizations is required, the token must have
write:org
scope and the token's account must have write access for all organizations. This can happen by being an admin in all relevant organizations.Learn more about how the GitHub API is used and what level of access is required in the corresponding feature documentation.
Fine-grained tokens can access public repositories, but can only access the private repositories of the account they are scoped to.
When creating your fine-grained access token, select the following permissions depending on the purpose of the token:
Feature | Required token permissions |
---|---|
Sync private repositories | Repository permissions: Contents - Access: Read-only |
Sync repository permissions | Repository permissions: Contents - Access: Read-only |
Batch changes | Unsupported |
WARNING: Fine-grained tokens don't support the
repositoryQuery
code host connection option or batch changes. Both of these features rely on GitHub's GraphQL API, which is unsupported by fine-grained access tokens.
To clone and search private repositories, we need a GitHub access token with the required scopes and at least read access to the relevant private repositories.
For more details, see GitHub API access.
There are four fields for configuring which repositories are mirrored/synchronized:
repos
A list of repositories inowner/name
format. The order determines the order in which we sync repository metadata and is safe to change.orgs
A list of organizations (every repository belonging to the organization will be cloned).repositoryQuery
A list of strings with three pre-defined options (public
,affiliated
,none
, none of which are subject to result limitations), and/or a GitHub advanced search query. Note: There is an existing limitation that requires the latter, GitHub advanced search queries, to return less than 1000 results. See this issue for ongoing work to address this limitation.exclude
A list of repositories to exclude which takes precedence over therepos
,orgs
, andrepositoryQuery
fields.
Always include a token in a configuration for a GitHub.com URL to avoid being denied service by GitHub's unauthenticated rate limits. If you don't want to automatically synchronize repositories from the account associated with your personal access token, you can create a token without a repo
scope for the purposes of bypassing rate limit restrictions only.
When Sourcegraph hits a rate limit imposed by GitHub, Sourcegraph waits the appropriate amount of time specified by GitHub before retrying the request. This can be several minutes in extreme cases.
Rate limiting may not be enabled by default. To check and verify the current rate limit settings, you may make a request to the /rate_limit
endpoint like this:
$ curl -s https://<github-enterprise-url>/api/v3/rate_limit -H "Authorization: Bearer <token>"
{
"message": "Rate limiting is not enabled.",
"documentation_url": "https://docs.github.com/enterprise/3.3/rest/reference/rate-limit#get-rate-limit-status-for-the-authenticated-user"
}
See Internal rate limits.
Prerequisite for configuring repository permission syncing: Add GitHub as an authentication provider.
Then, add or edit the GitHub connection as described above and include the authorization
field:
{
// ...
"authorization": {}
}
This needs to be done for every github code host connection if there is more than one configured.
Repo-centric permission syncing is done by calling the list repository collaborators GitHub API endpoint. To call this API endpoint correctly, we need a GitHub access token with the required scopes and read and write access to all relevant repositories.
IMPORTANT: We strongly recommend configuring both read and write access to associated repositories for permission syncing due to GitHub's token scope requirements. Without write access, there will be a conflict between user-centric sync and repo-centric sync. In that case, disable repo-centric permission sync (supported in Sourcegraph 5.0.4+).
IMPORTANT: Optional, but strongly recommended - continue with configuring webhooks for permissions.
NOTE: It can take some time to complete full cycle of repository permissions sync if you have a large number of users or repositories. See sync duration time for more information.
GitHub Enterprise has internal repositories in addition to the usual public and private repositories. Depending on how your organization structure is configured, you may want to make these internal repositories available to everyone on your Sourcegraph instance without relying on permission syncs. To mark all internal repositories as public, add the following field to the authorization
field:
{
// ...
"authorization": {
"markInternalReposAsPublic": true
}
}
If you would like internal repositories to remain private, but you're experiencing issues where user permission syncs aren't granting access to internal repositories, you can add the following field instead:
{
// ...
"authorization": {
"syncInternalRepoPermissions": true
}
}
NOTE: An explanation on visibility options in GitHub Enterprise.
public
- Only index public GitHub Enterprise repositories visible to all users. This excludes private and internal repos.private
- Index both public and private GitHub Enterprise repositories. This allows accessing private repos the token has access to.internal
- Include GitHub Enterprise internal repositories in addition to public/private repos. Internal repos are only visible to org members.
Follow the link to configure webhooks for permissions for Github
Experimental
WARNING: The following section is experimental and might not work properly anymore on new Sourcegraph versions (post 4.0+). Please prefer configuring webhooks for permissions instead
Github code host can leverage caching mechanisms to reduce the number of API calls used when syncing permissions. This can significantly reduce the amount of time it takes to perform a full cycle of permissions sync due to reduced instances of being rate limited by the code host, and is useful for code hosts with very large numbers of users and repositories.
Sourcegraph can leverage caching of GitHub team and organization permissions.
NOTE: You should only try this if your GitHub setup makes extensive use of GitHub teams and organizations to distribute access to repositories and your number of
users * avg_repositories
is greater than 250,000 (which roughly corresponds to the scale at which GitHub rate limits might become an issue).
This caching behaviour can be enabled via the authorization.groupsCacheTTL
field:
{
"url": "https://github.example.com",
"token": "$PERSONAL_ACCESS_TOKEN",
"authorization": {
"groupsCacheTTL": 72, // hours
}
}
In the corresponding authorization provider in site configuration, the allowGroupsPermissionsSync
field must be set as well for the correct auth scopes to be requested from users:
{
// ...
"auth.providers": [
{
"type": "github",
"url": "https://github.example.com",
"allowGroupsPermissionsSync": true,
}
]
}
A token that has the required scopes and both read and write access to all relevant repositories and organizations is needed to fetch repository permissions and team memberships. Read-only access will not work with cached permissions sync, but will work with careful configuration for regular GitHub permissions sync.
When enabling this feature, we currently recommend a default groupsCacheTTL
of 72
(hours, or 3 days). A lower value can be set if your teams and organizations change frequently, though the chosen value must be at least several hours for the cache to be leveraged in the event of being rate-limited (which takes an hour to recover from).
Cache invalidation happens automatically on certain webhook events, so it is recommended to configure webhook support when using cached permissions sync. Caches can also be manually invalidated if necessary.
To force a bypass of caches during a sync, you can manually queue users or repositories for sync with the invalidateCaches
options via the Sourcegraph GraphQL API:
mutation {
scheduleUserPermissionsSync(user: "userid", options: {invalidateCaches: true}) {
alwaysNil
}
}
To configure GitHub as an authentication provider (which will enable sign-in via GitHub), see the authentication documentation.
Using the webhooks
property on the external service has been deprecated.
Please consult this page in order to configure webhooks.
GitHub connections support the following configuration options, which are specified in the JSON editor in the site admin "Manage code hosts" area.
Sourcegraph displays search results from the default branch of a repository when no revision:
parameter is specified. If you'd like the search results to be displayed from another branch by default, you may change a repo's default branch on the github repo settings page. If this is not an option, consider using search contexts instead.
When Sourcegraph syncs repositories configured via repositoryQuery
, it consumes GitHub API search rate limit, which is lower than the normal rate limit. The affiliated
, public
and none
special values, however, trigger normal API requests instead of search API requests.
When the search rate limit quota is exhausted, an error like failed to list GitHub repositories for search: page=..., searchString=\"...\"
can be found in logs. To work around this try reducing the frequency with which repository syncing happens by setting a higher value (in minutes) of repoListUpdateInterval
in your Sourcegraph site config.
repositoryQuery
is the only repo syncing method that consumes GitHub search API quota, so if setting repoListUpdateInterval
doesn't work consider switching your syncing method to use another option, like orgs
, or using one of the special values described above.
The repositoryQuery
option "public"
is valuable in that it allows sourcegraph to sync all public repositories, however, it does not return whether or not a repo is archived. This can result in archived repos appearing in normal search. You can see an example of what is returned by the GitHub API for a query to "public" here.
If you would like to sync all public repositories while omitting archived repos, consider generating a GitHub token with access to only public repositories, then use repositoryQuery
with option affiliated
and an exclude
argument with option public
as seen in the example below:
{
"url": "https://github.example.com",
"gitURLType": "http",
"repositoryPathPattern": "devs/{nameWithOwner}",
"repositoryQuery": [
"affiliated"
],
"token": "TOKEN_WITH_PUBLIC_ACCESS",
"exclude": [
{
"archived": true
}
]
}