26 KiB
Windows Compatibility Plan for Borg Backup
Overview
This document outlines the plan to improve Windows compatibility in Borg Backup, focusing on path handling, pattern matching, and archive operations. The goal is to ensure that Borg works correctly on Windows while maintaining cross-platform archive compatibility.
Core Strategy
Forward Slash Standard
- Internal representation: All paths within Borg (in archives, pattern matching, path manipulation) use forward slashes (
/) as the path separator. - Rationale: Forward slashes work on all platforms (POSIX and Windows), simplifying internal logic and ensuring cross-platform archive compatibility.
Boundary Normalization
- Incoming normalization (Windows): Convert backslashes (
\) to forward slashes (/) at entry points where paths enter Borg from the Windows filesystem or user input.- Exception: User-provided patterns (include/exclude patterns, regex patterns, etc.) must NOT be normalized. Normalization would break complex patterns. Windows users are expected to provide POSIX-style patterns using forward slashes (
/), just like POSIX users. Normalization only applies to filesystem paths (e.g., during archive creation), not to patterns.
- Exception: User-provided patterns (include/exclude patterns, regex patterns, etc.) must NOT be normalized. Normalization would break complex patterns. Windows users are expected to provide POSIX-style patterns using forward slashes (
- Outgoing normalization: Minimal or no conversion when paths leave Borg to the filesystem, as Windows APIs accept forward slashes.
- Literal backslashes from POSIX: When extracting archives created on POSIX systems that contain literal backslashes in filenames, replace
\with%on Windows to prevent misinterpretation as path separators.
Security
- Prevent directory traversal attacks by rejecting paths containing
\..or..\patterns, even on POSIX systems. - This ensures that archives created on one platform cannot exploit path handling differences on another platform.
Replacement Character Choice
Why % as the replacement character:
- It's a valid filename character on both POSIX and Windows
- It's relatively uncommon in typical filenames
- It's visually distinct from path separators
Known limitation:
- Collisions are possible if a POSIX file has
%in its name and another file has\in its name. Both would map to the same name on Windows (e.g.,file%name.txtandfile\name.txtboth becomefile%name.txt). - This is an acceptable trade-off for simplicity.
- Users can avoid collisions by not using
%in POSIX filenames when creating archives intended for Windows extraction.
Windows Drive Letters
- Handling of Windows drive letters (e.g.,
C:) in archive paths is explicitly deferred and out of scope for this phase. The current behavior (if any) should remain unchanged.
Keep it simple
- Avoid complex platform-specific logic where possible.
- Leverage the fact that Windows APIs accept forward slashes in most contexts.
Critical: os.path.normpath vs posixpath.normpath
The Problem:
os.path.normpathis platform-dependent:- On POSIX: Collapses
..and., removes redundant/, keeps/as separator - On Windows: Collapses
..and., removes redundant separators, converts/to\
- On POSIX: Collapses
- Using
os.path.normpathon Windows breaks the forward-slash standard by converting all/to\
The Solution:
- Always use
posixpath.normpathinstead ofos.path.normpathfor internal path normalization posixpath.normpathalways uses/as the separator, regardless of the platform- This ensures consistent behavior across platforms and maintains the forward-slash standard
Where this matters:
- Pattern matching (
patterns.py) - Path manipulation in archive creation (
create_cmd.py) - Path specifications (
parseformat.py) - Helper functions (
fs.py) - Archive operations (
archive.py)
Critical: os.path.join vs posixpath.join
The Problem:
os.path.join()usesos.sep(backslash on Windows) to join path components- This breaks the forward-slash standard by introducing backslashes into internal paths
The Solution:
- Use
posixpath.join()instead ofos.path.join()for internal path operations posixpath.join()always uses/as the separator, regardless of platform
Where this matters:
- Archive creation (
create_cmd.py) - Path specifications (
parseformat.py) - Any internal path manipulation that should maintain the forward-slash standard
Exception:
os.path.join()may be acceptable when joining filesystem paths for OS operations (e.g., extraction destination paths), butposixpath.join()works on Windows too since Windows APIs accept forward slashes
Critical: Avoid os.path.abspath() and os.path.realpath()
The Problem:
- Both functions call
os.path.normpath()internally - On Windows, this converts all
/to\
The Solution:
- For internal paths: Apply boundary normalization first, then use
posixpath.normpath() - Avoid
os.path.abspath()andos.path.realpath()for internal path handling - If absolute paths are needed, construct them using
posixpath.join()with a normalized base path
Where this matters:
- Path specifications (
parseformat.py) - Any code that needs to resolve relative paths to absolute paths for archive storage
- Implementation Note: To get an absolute path on Windows to use with
posixpath:- Get CWD:
cwd = os.getcwd() - Normalize CWD (boundary normalization):
cwd = cwd.replace('\\', '/') - Use
posixpath.join(cwd, path)(assumingpathis already normalized)
- Get CWD:
Section 1: Path Sanitization and Boundary Normalization
Entry Points for Normalization
1. src/borg/helpers/parseformat.py (PathSpec)
Current behavior: Handles path specifications from the command line.
Changes needed:
- Normalize backslashes to forward slashes on Windows only for filesystem paths (paths being archived).
- Patterns must NOT be normalized. Patterns can be complex (especially regex patterns) and normalization would break them.
- Windows users are expected to provide POSIX-style patterns with forward slashes.
2. src/borg/archiver/create_cmd.py (do_create and _rec_walk)
Current behavior: Walks the filesystem to create archives.
Changes needed:
- In
_rec_walk, normalize paths from the filesystem walker to use forward slashes on Windows. - This ensures all paths entering the archive use the forward slash standard.
3. Archive Reading (Item.path and Item.target)
Current behavior: Reads paths from archives.
Changes needed:
Item.path
- Already uses
decode=to_sanitized_path. - Ensure
to_sanitized_pathinfs.pycallsmake_path_safe, which handles the replacement of literal backslashes with%on Windows.
Item.target
- Used in Borg 2 for symlink targets only. Hardlinks are identified by
hlid.
Encoding (Archive Creation on Windows):
- Symlink targets from the Windows filesystem must be normalized to use forward slashes before storing in the archive.
- Example:
..\sibling→../sibling,C:\foo\bar→C:/foo/bar - This can be done in the
Item.targetencode method. - Add an
encodeparameter to theItem.targetproperty definition initem.pyxthat calls a new helper function (e.g.,encode_link_target) infs.pyto normalize\to/on Windows.
Decoding (Archive Reading on Windows):
- Add
decode=to_safe_link_targetto handle literal backslashes from POSIX archives. - When a POSIX archive contains a symlink target with a literal backslash in a filename (e.g.,
foo\baras a single filename component), apply the replacement character (%) on Windows to prevent misinterpretation as a path separator.
Extraction on Windows:
- Symlink targets stored with
/can be used as-is during extraction. - The Windows API (including
os.symlink()) accepts forward slashes in symlink targets. - No conversion back to
\is needed.
Item.source
- Legacy field from Borg 1.x, used for both symlink and hardlink targets in Borg 1.x.
- In Borg 2,
Item.sourceis ONLY used during repository transfer from Borg 1.x to Borg 2. - at transfer time, borg2 will upgrade the item:
- for symlinks,
Item.sourceis used as-is (no changes) - for hardlinks,
Item.sourcewill be transformed intoItem.hlidby existing code. The upgraded item will not have aItem.sourcefield nor aItem.targetfield.
- for symlinks,
Helper Functions in src/borg/helpers/fs.py
make_path_safe
- Replace literal
\with%on Windows for file paths. - Add a security check to reject
\..and/..equivalently, and also reject..\and../equivalently, even on POSIX to prevent cross-platform directory traversal. - This security check applies only to file paths (
Item.path) and NOT to link targets (Item.target).
to_safe_link_target
- New helper to replace literal
\with%on Windows for link targets. - This function should be called from the
Item.targetdecode method. - Unlike
make_path_safe, this function should NOT apply the security check for\..patterns (as per the specification above, security checks apply only toItem.path).
get_strip_prefix
- Clarification: Boundary normalization (conversion of
\to/on Windows) must happen before this function is called. - The function only needs to detect the slashdot hack using
/./and does not need to care about backslashes. - Critical fix: In the return statement
return os.path.normpath(path[:pos]) + os.sep, replace withreturn posixpath.normpath(path[:pos]) + "/"to maintain forward-slash standard. - The current code uses
os.path.normpathwhich would convert/to\on Windows.
remove_dotdot_prefixes
- Clarification: Boundary normalization (conversion of
\to/on Windows) must happen before this function is called. - The function does not need to care about backslashes.
- Remove the redundant backslash normalization (
replace("\\", "/")) as this is now handled by boundary normalization. - Drive letter handling should remain as is.
Section 2: Pattern Matching
Files to Update
src/borg/patterns.py
Current behavior: Implements pattern matching for include/exclude rules.
Changes needed:
- Critical: Replace all
os.path.normpathwithposixpath.normpathto prevent/→\conversion on Windows. - Replace all
os.path.sepwith/in pattern matching logic. - Ensure that patterns are matched against paths using forward slashes.
- The pattern matcher should only handle
/as the separator.
Specific changes by pattern class:
PathFullPattern._prepare(): Replaceos.path.normpath(pattern).lstrip(os.path.sep)withposixpath.normpath(pattern).lstrip("/")PathPrefixPattern._prepare(): Replaceos.path.sepwith"/"andos.path.normpathwithposixpath.normpathFnmatchPattern._prepare(): Replaceos.path.sepwith"/"andos.path.normpathwithposixpath.normpathShellPattern._prepare(): Replaceos.path.sepwith"/"andos.path.normpathwithposixpath.normpathRegexPattern._match(): Already handles separator normalization correctly; no changes needed- Pattern
_match()methods inPathPrefixPattern,FnmatchPattern, andShellPattern: Replaceos.path.sepwith"/"
User Expectation
Important: Windows users are expected to provide the same patterns as POSIX (Unix) users.
- Patterns must use forward slashes (
/) as path separators. - Patterns can be rather complex (especially regex patterns).
- We cannot and must not normalize these patterns.
- Patterns must be used exactly as provided by the user.
Rationale: Normalizing patterns would break complex patterns, especially regex patterns. By requiring Windows users to use POSIX-style patterns, we maintain consistency and avoid breaking pattern logic.
Section 3: Archive Creation
Files to Update
src/borg/archiver/create_cmd.py
Current behavior: Implements the borg create command.
Changes needed:
- In
_rec_walk, normalize thepathvariable to use forward slashes immediately after receiving it from the filesystem walker, before passing it to the matcher or any other processing. - This ensures all downstream operations work with normalized paths.
- Convert
pathto use forward slashes at the start of the creation loop. - Critical: Replace
os.path.normpathwithposixpath.normpathto prevent/→\conversion on Windows:- In
do_create()method: when normalizing paths from command-line arguments - In
_rec_walk()method: when joining path with directory entry name, useposixpath.normpath(posixpath.join(path, dirent.name))
- In
- Critical: Replace
os.path.joinwithposixpath.jointo prevent backslash introduction on Windows:- When joining
pathwithdirent.namein_rec_walk() - When joining
pathwithtag_namefor cache tag handling - Any other path joining operations for archive paths (not filesystem destination paths)
- When joining
Section 4: Archive Path Manipulation
Files to Update
src/borg/archiver/extract_cmd.py
Current behavior: Implements the borg extract command with --strip-components.
Changes needed:
- In
extract_cmd.py, replaceos.sepwith/when stripping path components. - Ensure that
strip_componentslogic works with forward slashes.
src/borg/archive.py (Archive.create_helper)
Current behavior: Helper method for archive creation.
Changes needed:
- Review
Archive.create_helperfor any uses ofos.sepin path operations and replace with/. - In
Archive.create_helper, replaceos.sepwith/when checking for prefixes and stripping them. - Critical: Replace
os.path.normpathwithposixpath.normpathin tar import functionality:- When normalizing
tarinfo.nameduring tar import - When normalizing
tarinfo.linknameduring tar import
- When normalizing
- These normalizations must preserve forward slashes to maintain the forward-slash standard.
src/borg/helpers/parseformat.py (PathSpec)
Current behavior: Handles path specifications from the command line.
Changes needed:
- Critical: Replace
os.path.normpathwith appropriate handling inPathSpecclass:- For pattern paths: use
posixpath.normpathdirectly (no boundary normalization needed) - For filesystem paths: apply boundary normalization first (convert
\to/on Windows), then useposixpath.normpath
- For pattern paths: use
- Critical: Replace
os.path.abspathwith getcwd() + boundary normalization +posixpath.normpath:- When resolving filesystem paths to absolute paths, first apply boundary normalization (
\→/on Windows) - Like
os.path.abspath, but usesos.getcwd()(normalized to/) as base if needed. - Construct absolute paths using
posixpath.join()with the normalized CWD.
- When resolving filesystem paths to absolute paths, first apply boundary normalization (
- Ensure boundary normalization happens BEFORE
posixpath.normpathfor filesystem paths - Patterns must NOT have their backslashes normalized (they should be treated as literal characters)
Extraction on Windows
Important clarification:
- No conversion back to native paths (backslashes) is needed when extracting files on Windows.
- The Windows API accepts forward slashes as path separators for file paths (not just symlink targets).
- Archive paths with
/can be used directly for filesystem operations. - The replacement character (
%) representing literal backslashes from POSIX filenames must be extracted as-is. - Rationale: To avoid giving Windows a backslash that was not meant to be a path separator.
Section 5: FUSE Operations (Deferred)
Status
- FUSE support on Windows is limited and not a priority for this phase.
- Changes to FUSE code are deferred to a future phase.
Future Considerations
- If FUSE is implemented on Windows, ensure that paths are normalized consistently.
- Consider whether FUSE should be disabled or skipped in tests until proper Windows support is added.
Section 6: Test Suite Updates
1. Existing Test Updates
rejected_dotdot_paths: Update to include\..and..\patterns for security validation.test_regex_pattern: Update to ensure regex patterns work with the forward-slash standard.test_archived_paths: Simplify after boundary normalization is implemented.- New test cases: Add Windows-style path tests in
test_create.py,test_extract.py, andtest_patterns.py.
2. Symlink and Hardlink Handling
- Test symlink target normalization on Windows: Create a symlink with a
backslash target (e.g.,
..\sibling) on Windows, create an archive, and verify the target is stored with forward slashes in the archive. - Test symlink extraction on Windows: Extract an archive containing symlinks with forward-slash targets on Windows and verify the symlinks work correctly.
- Test literal backslashes in symlink targets from POSIX archives: Create
an archive on POSIX with a symlink target containing a literal backslash
(e.g.,
foo\baras a single filename component), extract on Windows, and verify the backslash is replaced with%. - Test hardlink handling: Verify that hardlinks are handled consistently across platforms with the forward-slash standard.
3. Extraction with Replacement Character
- Test extraction of files with literal backslashes from POSIX archives:
Create an archive on POSIX with filenames containing literal backslashes,
extract on Windows, and verify the backslashes are replaced with
%. - Test replacement character preservation: Verify that
%is not converted back to\during extraction on Windows. - Test collision scenarios: Test edge cases where a POSIX archive
contains both a file with
%in its name and a file with\in its name (both would map to the same name on Windows).
4. Security Checks
- Test
\..and..\rejection: Verify that paths containing\..or..\are rejected on all platforms (POSIX and Windows). - Test security check in
make_path_safe: Verify that the security check correctly identifies and rejects malicious patterns inItem.pathonly (NOT inItem.target, as per the specification in Section 1). - Test cross-platform security: Create an archive on POSIX with paths
containing
\..patterns, attempt to extract on Windows, and verify the paths are rejected.
5. Pattern Matching
- Test Windows-style pattern input behavior: Provide patterns with backslashes on Windows and verify the behavior (should be treated as literal characters, not separators).
- Test complex regex patterns: Verify that complex regex patterns work without normalization breaking them.
- Test pattern matching consistency: Verify that the same patterns produce the same results on Windows and POSIX.
7. Cross-Platform Archive Compatibility
- Test archives created on Windows, extracted on POSIX: Create an archive on Windows, extract on POSIX, and verify all paths are correct.
- Test archives created on POSIX, extracted on Windows: Create an archive
on POSIX (including files with literal backslashes in names), extract on
Windows, and verify paths are correct (with
%replacement). - Test round-trip compatibility: Create an archive on one platform, extract on another, create a new archive, and verify the contents match.
8. Boundary Normalization Timing
- Test
get_strip_prefixreceives normalized paths: Verify thatget_strip_prefixonly sees paths with/on Windows (no\). - Test
remove_dotdot_prefixesreceives normalized paths: Verify thatremove_dotdot_prefixesonly sees paths with/on Windows (no\). - Test normalization order: Verify that boundary normalization happens
before
get_strip_prefixandremove_dotdot_prefixesare called.
9. Error Handling and User Feedback
- Test error messages for modified paths: Verify that users receive clear
error messages when paths are modified (e.g.,
\replaced with%). - Test warnings during extraction: Verify that appropriate warnings are shown when extracting files with replacement characters.
- Test user-friendly error messages: Verify that error messages explain Windows path limitations clearly.
10. Path Normalization (posixpath.normpath vs os.path.normpath)
- Test that
posixpath.normpathis used in all critical code paths: Verify that internal path normalization usesposixpath.normpathto maintain the forward-slash standard on all platforms. - Test pattern matching with normalized paths: Verify that patterns work
correctly after normalization with
posixpath.normpath. - Test archive paths contain only forward slashes: Create archives on
Windows and verify all stored paths use
/as separator (no\). - Test
get_strip_prefixwith forward slashes: Verify that the slashdot hack (/./) works correctly and the prefix uses forward slashes. - Test path collapsing behavior: Verify that
../and./are collapsed correctly usingposixpath.normpathon Windows. - Regression test: Ensure no code accidentally reintroduces
os.path.normpathin critical paths (patterns, archive creation, path manipulation).
Section 7: Verification Plan
Testing Approach
- Unit tests: Add unit tests for all helper functions (
make_path_safe,to_safe_link_target,get_strip_prefix,remove_dotdot_prefixes). - Integration tests: Add integration tests for archive creation, extraction, and pattern matching on Windows.
- Cross-platform tests: Run tests on both Windows and POSIX systems to verify cross-platform compatibility.
- Manual testing: Perform manual testing on Windows with real-world scenarios (symlinks, hardlinks, complex patterns, etc.).
Test Environment
- Primary: Native Windows environment (Windows 10 or later).
- Secondary: Windows Subsystem for Linux (WSL) for cross-platform testing, or a simulated Windows environment.
Success Criteria
- All existing tests pass on Windows.
- All new tests pass on Windows and POSIX.
- Archives created on Windows can be extracted on POSIX and vice versa.
- Symlinks and hardlinks work correctly on Windows.
- Pattern matching works consistently across platforms.
- Security checks prevent directory traversal attacks on all platforms.
Section 8: Implementation Order
Phase 1: Foundation
- Implement helper functions in
fs.py(make_path_safe,to_safe_link_target). - Update
Item.pathandItem.targetencoding/decoding initem.pyx. - Add security checks for
\..and..\patterns. - Fix
os.path.normpath→posixpath.normpathinfs.py:- In
get_strip_prefix(): Replaceos.path.normpathwithposixpath.normpathin the return statement - Ensure
remove_dotdot_prefixes()already usesposixpath.normpath
- In
Phase 2: Boundary Normalization and Path Operations
- Update
PathSpecinparseformat.py:- Apply boundary normalization (
\→/) for filesystem paths on Windows BEFORE callingposixpath.normpath - Replace
os.path.normpathwithposixpath.normpathfor both patterns and filesystem paths - Replace
os.path.abspathwith boundary normalization +posixpath.normpathfor filesystem paths - Note: Patterns get
posixpath.normpathonly; filesystem paths get boundary normalization first, thenposixpath.normpath
- Apply boundary normalization (
- Update
_rec_walkanddo_createincreate_cmd.py:- Apply boundary normalization (
\→/) for paths from filesystem walker on Windows - Replace
os.path.normpathwithposixpath.normpathin bothdo_create()and_rec_walk()methods - Replace
os.path.joinwithposixpath.joinwhen joining path components for archive paths (e.g.,pathwithdirent.name,pathwithtag_name)
- Apply boundary normalization (
- Update
remove_dotdot_prefixesinfs.py(remove redundant backslash normalization).
Phase 3: Pattern Matching and Archive Operations
- Fix
os.path.normpath→posixpath.normpathinpatterns.py:- All pattern classes: PathFullPattern, PathPrefixPattern, FnmatchPattern, ShellPattern
- Replace all
os.path.sepwith"/"in pattern matching logic
- Update archive creation and extraction logic to use
/exclusively. - Fix
os.path.normpath→posixpath.normpathinarchive.pytar import functionality (fortarinfo.nameandtarinfo.linkname). - Update
Archive.create_helperto use/for path operations.
Phase 4: Testing and Validation
- Add unit tests for all changes.
- Add integration tests for cross-platform compatibility.
- Add tests to verify
posixpath.normpathis used instead ofos.path.normpathin all critical paths. - Perform manual testing on Windows
Section 9: Documentation Updates
User-Facing Documentation
- Usage guide: Add notes about Windows path separator handling (forward slashes in patterns, backslash normalization).
- Changelog: Document path separator handling improvements for Windows.
Developer Documentation
- Architecture: Document the forward-slash standard and boundary normalization approach for path separators.
- Contributing guide: Add notes about path separator handling considerations for contributors.
Section 10: Known Limitations and Future Work
Known Limitations
- Replacement character collisions: Files with
%and\in names on POSIX may collide on Windows. - Windows drive letters: Handling of drive letters in archive paths is deferred.
- FUSE support: FUSE operations on Windows are deferred.
Future Work
- Windows drive letter handling: Implement proper handling of drive letters in archive paths (path separator considerations for absolute paths).
- FUSE support on Windows: Implement FUSE operations on Windows with proper path separator handling (if feasible).
Conclusion
This plan provides a comprehensive approach to improving Windows compatibility in Borg Backup. By adopting a forward-slash standard and implementing boundary normalization, we can simplify internal logic while maintaining cross-platform archive compatibility. The phased implementation approach ensures that changes are made incrementally and thoroughly tested.