Skip to content

Issue 564: add size filtering and typed min access level#891

Open
WilliamZhang9 wants to merge 1 commit intosourcebot-dev:mainfrom
WilliamZhang9:u/williamzhang/issue_564_add_repo_size_exclude_option_for_gitlab
Open

Issue 564: add size filtering and typed min access level#891
WilliamZhang9 wants to merge 1 commit intosourcebot-dev:mainfrom
WilliamZhang9:u/williamzhang/issue_564_add_repo_size_exclude_option_for_gitlab

Conversation

@WilliamZhang9
Copy link

@WilliamZhang9 WilliamZhang9 commented Feb 14, 2026

Problem

#564

Solution

Support excluding GitLab projects by statistics-backed size bounds and pass through minAccessLevel for project, and align it with the GitLab access level.

Summary by CodeRabbit

  • New Features

    • Add configurable GitLab minimum access level when listing projects.
    • Add configurable size-based exclusion (min/max bytes) to skip projects by storage usage.
  • Tests

    • Added unit tests for size-based exclusion covering below-min, above-max, within-range, and missing-size-statistics scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds GitLab project filtering by minimum access level and by project storage size; exports an AccessLevel enum; threads minAccessLevel and statistics into project-listing options; implements getProjectSizeBytes and size-based exclusion logic; updates schemas/types; adds unit tests for size exclusion.

Changes

Cohort / File(s) Summary
Backend logic & tests
packages/backend/src/gitlab.ts, packages/backend/src/gitlab.test.ts
Adds exported AccessLevel enum and ProjectsAccessLevel alias; introduces projectListOptions (includes minAccessLevel and statistics) used across Projects/Groups/Users listing; replaces hard-coded perPage; implements getProjectSizeBytes and size-based checks in shouldExcludeProject; adds four unit tests covering size-boundaries and missing stats.
Schemas — JSON & public
schemas/v3/gitlab.json, packages/schemas/src/v3/gitlab.schema.ts, packages/schemas/src/v3/connection.schema.ts, packages/schemas/src/v3/index.schema.ts
Adds minAccessLevel enum (5,10,20,30,40,50) for GitLab groups and exclude.size object with min/max integer fields and additionalProperties: false.
Types (TS)
packages/schemas/src/v3/connection.type.ts, packages/schemas/src/v3/gitlab.type.ts, packages/schemas/src/v3/index.type.ts
Adds `minAccessLevel?: 5

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Config as rgba(66,133,244,0.5)
participant Backend as rgba(15,157,88,0.5)
participant GitLabAPI as rgba(219,68,55,0.5)
participant Filter as rgba(244,180,0,0.5)

Config->>Backend: provide GitLab connection (minAccessLevel?, exclude.size?)
Backend->>GitLabAPI: Projects.all / Groups.allProjects / Users.allProjects (projectListOptions: perPage, minAccessLevel?, statistics?)
GitLabAPI-->>Backend: paginated project list (includes project.statistics when requested)
Backend->>Filter: shouldExcludeProject(project, exclude.size?)
Filter-->>Backend: include/exclude decision (uses getProjectSizeBytes to read statistics)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding size filtering and typed min access level for GitLab projects, which are the primary features introduced across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
packages/backend/src/gitlab.ts (2)

23-23: Unused type alias ProjectsAccessLevel.

ProjectsAccessLevel is an identity alias for AccessLevel and is referenced only once (line 62). Consider using AccessLevel directly to reduce indirection.


279-292: Silent pass-through when statistics are unavailable.

When getProjectSizeBytes returns undefined (e.g., insufficient permissions for statistics), the project silently passes the size filter. Consider logging a warning so operators can diagnose why a project wasn't excluded despite a size rule being configured.

📝 Suggested improvement
         if (exclude?.size) {
             const projectSizeBytes = getProjectSizeBytes(project);
             if (projectSizeBytes !== undefined) {
                 if (exclude.size.min !== undefined && projectSizeBytes < exclude.size.min) {
                     reason = `project size (${projectSizeBytes}) is less than \`exclude.size.min\` (${exclude.size.min})`;
                     return true;
                 }
 
                 if (exclude.size.max !== undefined && projectSizeBytes > exclude.size.max) {
                     reason = `project size (${projectSizeBytes}) is greater than \`exclude.size.max\` (${exclude.size.max})`;
                     return true;
                 }
+            } else {
+                logger.warn(`Size filter configured but no statistics available for project ${projectName}. Skipping size-based exclusion.`);
             }
         }

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@WilliamZhang9 WilliamZhang9 force-pushed the u/williamzhang/issue_564_add_repo_size_exclude_option_for_gitlab branch 3 times, most recently from 3d0627a to 40a7708 Compare February 14, 2026 15:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/backend/src/gitlab.ts (1)

183-219: ⚠️ Potential issue | 🟠 Major

Size filtering won't apply to individually specified projects.

The config.projects path uses api.Projects.show(project) which does not include statistics: true from projectListOptions. As a result, getProjectSizeBytes will return undefined for these projects, and size-based exclusion will be silently skipped. If a user configures both projects and exclude.size, they may not get the expected behavior.

Consider either:

  1. Passing statistics: true to Projects.show() when exclude.size is configured, or
  2. Documenting that size filtering only applies to projects discovered via all, groups, or users.
Proposed fix
             const { durationMs, data } = await measure(async () => {
-                    const fetchFn = () => api.Projects.show(project);
+                    const fetchFn = () => api.Projects.show(project, {
+                        ...(config.exclude?.size ? { statistics: true } : {}),
+                    });
                     return fetchWithRetry(fetchFn, `project ${project}`, logger);
                 });
🤖 Fix all issues with AI agents
In `@packages/schemas/src/v3/connection.type.ts`:
- Around line 138-141: Update the union type for minAccessLevel in the
Connection type: remove the invalid value 0 and add 15 (Planner) so the field
reads minAccessLevel?: 5 | 10 | 15 | 20 | 30 | 40 | 50; — locate the
minAccessLevel property in connection.type.ts and adjust the allowed numeric
literals accordingly to match GitLab's API.
🧹 Nitpick comments (4)
packages/schemas/src/v3/gitlab.schema.ts (1)

167-180: Consider enforcing non-negative size bounds.

Adding a minimum of 0 for min/max prevents negative sizes in config. Since this file is auto-generated, update the schema source/generator accordingly.

♻️ Suggested schema tweak
             "min": {
               "type": "integer",
+              "minimum": 0,
               "description": "Minimum project size (in bytes) to sync (inclusive). Projects smaller than this will be excluded."
             },
             "max": {
               "type": "integer",
+              "minimum": 0,
               "description": "Maximum project size (in bytes) to sync (inclusive). Projects larger than this will be excluded."
             }
schemas/v3/gitlab.json (1)

139-153: Schema definition for exclude.size looks good.

additionalProperties: false is correctly set. Consider adding minimum: 0 to the min and max properties to prevent nonsensical negative byte values at the schema validation level, though this is a minor hardening.

packages/backend/src/gitlab.ts (2)

279-292: Silent skip when statistics are missing could be surprising.

When exclude.size is configured but getProjectSizeBytes returns undefined (e.g., statistics not available), the project passes through without any size check or warning. Consider logging a debug/warning message when statistics are expected but missing, so operators can diagnose why size filtering isn't working for specific projects.

Proposed enhancement
         if (exclude?.size) {
             const projectSizeBytes = getProjectSizeBytes(project);
             if (projectSizeBytes !== undefined) {
                 if (exclude.size.min !== undefined && projectSizeBytes < exclude.size.min) {
                     reason = `project size (${projectSizeBytes}) is less than \`exclude.size.min\` (${exclude.size.min})`;
                     return true;
                 }
 
                 if (exclude.size.max !== undefined && projectSizeBytes > exclude.size.max) {
                     reason = `project size (${projectSizeBytes}) is greater than \`exclude.size.max\` (${exclude.size.max})`;
                     return true;
                 }
+            } else {
+                logger.debug(`No statistics available for project ${projectName}, skipping size-based exclusion check.`);
             }
         }

325-356: getProjectSizeBytes silently falls back from storage_size to repository_size — these metrics differ significantly.

storage_size includes the entire project (repo, LFS, packages, artifacts, etc.), while repository_size is just the git repository. Falling through from one to the other without any indication could lead to inconsistent filtering behavior across projects. The field name priority should be documented in the schema description or in a code comment explaining the preference order.

Also, the dual snake_case/camelCase handling is a good defensive measure against @gitbeaker/rest response serialization differences.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/backend/src/gitlab.ts`:
- Around line 64-73: projectListOptions only sets statistics: true when
config.exclude?.size is present, but the Projects.show() call (used to fetch
individually listed projects) doesn't pass that option, so project.statistics
stays undefined and getProjectSizeBytes() can't filter by size; update the call
to Projects.show(...) to pass the same options object (or at least { statistics:
true } when config.exclude?.size) as the second argument so that
project.statistics is populated for size filtering—use the existing
projectListOptions or a conditional { statistics: true } when calling
Projects.show in the code path that loads individual projects.
🧹 Nitpick comments (2)
packages/backend/src/gitlab.ts (1)

279-292: Silent skip when statistics are unavailable — consider logging a warning.

When exclude.size is configured but getProjectSizeBytes returns undefined (e.g., statistics not available for the project), the size filter is silently bypassed. A debug/warn log here would help users troubleshoot why certain projects aren't being filtered.

Proposed improvement
         if (exclude?.size) {
             const projectSizeBytes = getProjectSizeBytes(project);
             if (projectSizeBytes !== undefined) {
                 if (exclude.size.min !== undefined && projectSizeBytes < exclude.size.min) {
                     reason = `project size (${projectSizeBytes}) is less than \`exclude.size.min\` (${exclude.size.min})`;
                     return true;
                 }
 
                 if (exclude.size.max !== undefined && projectSizeBytes > exclude.size.max) {
                     reason = `project size (${projectSizeBytes}) is greater than \`exclude.size.max\` (${exclude.size.max})`;
                     return true;
                 }
+            } else {
+                logger.warn(`Size exclusion configured but no statistics available for project ${projectName}. Skipping size filter.`);
             }
         }
schemas/v3/gitlab.json (1)

138-153: Consider clarifying which GitLab size field is used.

The description says "GitLab statistics size fields" but doesn't specify which field. The backend implementation (getProjectSizeBytes) preferentially uses storage_size (which includes LFS, artifacts, etc.) and falls back to repository_size. Documenting this in the schema description would help users set appropriate thresholds.

Suggested description improvement
-                    "description": "Exclude projects based on GitLab statistics size fields (in bytes).",
+                    "description": "Exclude projects based on GitLab statistics size fields (in bytes). Uses `storage_size` when available (includes LFS, artifacts, etc.), falling back to `repository_size`.",

Support excluding GitLab projects by statistics-backed size bounds and pass through minAccessLevel for project listing with AccessLevel-aligned typing.
@WilliamZhang9 WilliamZhang9 force-pushed the u/williamzhang/issue_564_add_repo_size_exclude_option_for_gitlab branch from 525570a to a65ffd8 Compare February 15, 2026 01:21
@WilliamZhang9
Copy link
Author

It looks only users with the Reporter, Developer, Maintainer, or Owner roles can retrieve a GitLab project’s size. Do we need to use minAccessLevel to implement the same limit for Sourcebot? We can rely on the GitLab response and filter projects by size only when the size data is returned.
https://docs.gitlab.com/api/projects/#list-all-projects

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant