This should fix#7366 for now, but using the (IMHO) pragmatic approach
of extending the sed expression to recognize strings.
However, this approach is obviously not parsing the full AST, nor does
it wrap Python itself (as pointed out by @spwhitt in #7366) but tries to
match Python strings as best as possible without getting TOO unreadable.
We also use a little bit of Nix to help generating the SED expression,
because doing the whole quote matching block over and over again would
be quite repetitious and error-prone to change. The reason why I'm using
imap here is that we need to have unique labels to avoid jumping into
the wrong branch.
So the new expression is not only able to match continous regions of
triple-quoted strings, but also regions with only one quote character
(even with escaped inner quotes) and empty strings.
However, what it doesn't correctly recognize is something like this:
"string1" "string2" "multi
line
string"
Which is very unlikely that we'll find something like this in the wild.
Of course, we could handle it as well, but it would mean that we need to
substitute the current line into hold space until we're finished parsing
the strings, branch off to another label where we match multiline
strings of all sorts and swap hold/pattern space and finally print the
result. So to summarize: The SED expression would be 3 to 4 times bigger
than now and we gain very little from that.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Not really critical for anything we have in <nixpkgs> I guess, but
skipping lines three times really was a workaround and we're better off
just appending the lines ending with backslash to the pattern space so
we can accumulate all the crap until the last line of crap (crap, that
is "broken lines").
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The bazaar package is still broken even with 5f01cc7, because __future__
imports need to be the first imports before anything else. So this time
I'm going to make the sed expression with explicit branching so we can
properly match all the occasions we want to skip and insert the line
modifying sys.argv[0] only _once_ and leave the command block after
that one substitution. So no ugly swaps between hold and pattern space.
The label which is resonsible for not escaping the command block is "r"
and we jump to it as long as we need to skip something from the start of
the file.
While at it, I'm not only skipping every line with __future__ in it but
also backslashes at the end of the line, so for example:
```python
from __future__ import shiny_feature1, \
shiny_feature2, \
shiny_feature3
```
... will now be properly skipped as well.
Tested against bazaar and nixops.
Thanks to @edolstra for reporting this.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Fixes issues introduced by 24ef871e6a.
The problem here is that "import sys; sys.argv[0] = ..." is just
appended after the first "#!", which in turn breaks things such as
encoding specifications. A second problem - although not very common -
is when there's another #! within the script.
This should take care of both cases.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>