Skip to content

re.sub() repl notation explanation is disjointed #144884

@cben

Description

@cben

Documentation

[I'm sending a PR proposing multiple changes, trying to list here the more "objective" issues and leaving my subjective choices to the PR...]

re.sub, also briefly in re.Match.expand.
Information on how repl argument is processed is split between the first (" repl can be a string or a function; if it is a string ...") and second-to-last paragraphs ("In string-type repl arguments, in addition ..."), in a way that is somewhat arbitrary/illogical, and some important subtleties are omitted:

  • "Unknown escapes" are discussed in first paragraph after mentioning only regular python escapes like \n, before sentence on \6 and far before introducing \g<...> in late paragraph!
  • \6 is introduced early but details of how \20 is parsed is added late in \g discussion. Ambiguities vs. octal notation (\02, \200, \2000) are not mentioned. Reader may guess this is like \2 vs. octal in regex notation.
  • The wording suggests "all escapes" supported in Python string literals are processed. Not so — \x\u\U\N aren't, UNLIKE regex notation. This bears on "unknown escape" handling.

Additionally, the final paragraph documenting flags (added in #119960) comes far after "The pattern may be a string or a Pattern", but actually flags param is only allowed when pattern pattern is a string, which can be best explained by moving these together.
[However, that dependence is worth mentioning in several other functions, so perhaps that deserves separate issue/PR?]

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions