This commit deprecates the earlier groupname[x-y] syntax in favour of
the inclusive groupname[x:y] syntax. It also makes the subscripting
code simpler and adds explanatory comments.
One problem addressed by the cleanup is that _enumeration_info used to
be called twice, and its results discarded the first time because of the
convoluted control flow.
The possibilities are complicated enough that I didn't want to make
changes without having a complete description of what it actually
accepts/matches. Note that this text documents current behaviour, not
necessarily the behaviour we want. Some of this is undocumented and may
not be intended.
Now we accept IPv6 addresses _with port numbers_ only in the standard
[xxx]:NN notation (though bare IPv6 addresses may be given, as before,
and non-IPv6 addresses may also be placed in square brackets), and any
other host identifiers (IPv4/hostname/host pattern) as before, with an
optional :NN suffix.
The new code parses INI-format inventory files in a single pass using a
well-documented state machine that reports precise errors and eliminates
the duplications and inconsistencies and outright errors in the earlier
three-phase parsing code (e.g. three ways to skip comments). It is also
much easier now to follow what decisions are being taken on the basis of
the parsed data. The comments point out various potential improvements,
particularly in the area of consistent IPv6 handling.
On the ornate marble tombstone of the old code, the following
inscription is one last baffling memento from a bygone age:
- def _before_comment(self, msg):
- ''' what's the part of a string before a comment? '''
- msg = msg.replace("\#","**NOT_A_COMMENT**")
- msg = msg.split("#")[0]
- msg = msg.replace("**NOT_A_COMMENT**","#")
- return msg
first off, we add an oddly slow basic test of 10k item inventory
Before:
```
Ran 229 tests in 13.214s
OK
real 0m13.403s
user 0m12.106s
sys 0m1.155s
```
After:
```
Ran 230 tests in 21.328s
OK
real 0m21.516s
user 0m20.099s
sys 0m1.275s
```
since that seems like a bit long for the test to add to runtime, lets profile
`python -m cProfile -s time ./bin/ansible all -i test/units/inventory_test_data/huge_range --list-hosts`
Before:
```
1272607 function calls (1259689 primitive calls) in 8.497 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 4.393 0.000 4.396 0.000 __init__.py:395(_get_host)
20000 2.695 0.000 2.697 0.000 __init__.py:341(__append_host_to_results)
40369 0.113 0.000 0.113 0.000 {posix.lstat}
50006 0.102 0.000 0.153 0.000 __init__.py:1490(combine_vars)
40008 0.089 0.000 0.202 0.000 __init__.py:1546(_load_vars_from_path)
20195 0.088 0.000 0.088 0.000 {posix.stat}
10011 0.087 0.000 0.087 0.000 {posix.getcwd}
```
The top two lines are promising optimization targets
- populate Inventory's host cache more in _get_host, as we are looping
over all the groups anyways.
- eliminate duplicate check of whether we've already included a host
in the construction around __append_host_to_results we can infer
presence of a host in the results list implies the presence of its
name in the hostnames set, allowing us to only to the less expensive
of the two checks
After:
```
1252610 function calls (1239692 primitive calls) in 1.320 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
40369 0.105 0.000 0.105 0.000 {posix.lstat}
50006 0.094 0.000 0.141 0.000 __init__.py:1490(combine_vars)
40008 0.081 0.000 0.184 0.000 __init__.py:1546(_load_vars_from_path)
10011 0.080 0.000 0.080 0.000 {posix.getcwd}
20195 0.074 0.000 0.074 0.000 {posix.stat}
10002 0.069 0.000 0.261 0.000 __init__.py:1517(load_vars)
```
This function takes a string like 'foo:bar[1:2]:baz[x:y]-quux' and
returns a list of patterns ['foo', 'bar[1:2]', 'baz[x:y]-quux'], i.e.
splits the string on colons that are not part of a range specification.
get_hosts → used externally, not changed
_get_hosts → _evaluate_patterns (takes a list, evaluates ! and &)
__get_hosts → _match_one_pattern (takes one pattern only, ignores !&)
This function takes a string like 'foo:bar[1:2]:baz[x:y]-quux' and
returns a list of patterns ['foo', 'bar[1:2]', 'baz[x:y]-quux'], i.e.
splits the string on colons that are not part of a range specification.
This was used earlier to implement serial, but that's now done using
restrict_to_hosts() (whose docstring is also suitably adjusted here)
and there are no more callers.
This is a slightly different fix than we originally committed, but fixes
the problem in a less invasive way (and I believe it's generally better
that we don't deal with relative paths internally past this point)
Fixes#11789