The changes to exclude implicit localhosts from group patterns exposed
the bug that we sometimes create multiple implicit localhosts, which
caused some bugs with things like includes, where the host was used as
an entry into a dict, so having multiple meant that the incorrect host
(with a different uuid) was found and includes were not executed for
implicit localhosts.
* In the VariableManager, we were not properly tracking if a file
had already been loaded, so we continuously append data to the end
of the list there for host and group vars, meaning large sets of data
are duplicated multiple times
* In the inventory, we were merging the host/group vars with the vars
local to the host needlessly, as the VariableManager already handles that.
This leads to needless duplication of the data and makes combining the
vars in VariableManager take even longer.
Ansible excessively checks the file system for the potential presence of
`group_vars` and `host_vars` files.
For large numbers of groups this leads to combinatorial performance
issues.
This commit generates a set of group_vars and host_vars filenames using
`os.listdir()` in every possible location and then checks against the sets
before making a stat of the file system.
Also included in this commit is caching of the base directory lookup
for the inventory.
Issue #15633 observes that a meta: inventory_refresh task causes the playbook
to exit. An inventory refresh flushes all caches and rebuilds all host
objects, assigning new UUIDs to each. These new host UUIDs currently fail to
match those on host objects stored for restrictions in the inventory, causing
the playbook to exit for having no hosts to run further tasks against.
This changeset attempts to address this issue by storing host restrictions
by name, and comparing inventory host names against these names when applying
restrictions in get_hosts.
* now you can specify a yaml invenotry file
* ansible_group_priority will now set this property on groups
* added example yaml inventory
* TODO: make group var merging depend on priority
groups, child/parent relationships should remain unchanged.
The use of realpath means when following symlinks the actual path is
used when loading these files in the VariableManager, which may not
line up with the host or group name specified.
Fixes#14545
by moving to en-bloc unicode conversion to act on scripts stdout
Both python-json and simplejson always return unicode strings when using
their loads() method on unicode strings. This is true at least since
2009. This makes checking each substring unnecessary, because we do not
need to recursively check the strings contained in the inventory dict
later one-by-one
This commit makes parsing of large dynamic inventory at least 2 seconds
faster.
cf: https://github.com/towolf/ansible-large-inventory-testcase
This prevents a bug where the existing cache outside of the class
is not cleared when creating a new Inventory object. This only really
affects people using the API directly right now, but wanted to fix it
to prevent weird errors from popping up.
Letting it pass would just cause an error later on (no such file found)
so it's better to catch it here and know that we have users dealing with
non-utf8 pathnames than to have to track it down from later on.
Note that the fix for display normalizing to unicode is correct but the
fix for pathnames is probably not. Changing pathnames to unicode type
means that we will handle utf8 pathnames fine but pathnames can be any
sequence of bytes that do not contain null. We do not handle sequences
of bytes that are not valid utf8 here. To do that we need to revamp the
handling of basedir and paths to transform to bytes instead of unicode.
Didn't want to do that in 2.0.x as it will potentially introduce other
bugs as we find all the places that we combine basedir with other path
elements. Since no one has raised that as an issue thus far so it's not
something we need to handle yet. But it's something to keep in mind for
the future.
To test utf8 handling, create a utf8 directory and run a playbook from
within there.
To test non-utf8 handling (currently doesn't work as stated above), create
a directory with non-utf8 chars an run a playbook from there. In bash,
create that directory like this: mkdir $'\377'
Fixes#13937
and without hosts and vars
Without this patch, the simplified syntax is triggered when a group
is defined like this:
"platforms": {
"children": [
"cloudstack"
]
}
Which results in a group 'platforms' with 1 host 'platforms'.
more details in https://github.com/ansible/ansible/issues/13655
* Changed parse_addresses to throw exceptions instead of passing None
* Switched callers to trap and pass through the original values.
* Added very verbose notice
* Look at deprecating this and possibly validate at plugin instead
fixes#13608
Ansible previously added hosts to the host list multiple times for commands
like `ansible -i 'localhost,' -c local -m ping 'localhost,localhost'
--list-hosts`.
8d5f36a fixed the obvious error, but still added the un-deduplicated list to a
cache, so all future invocations of get_hosts() would retrieve a
non-deduplicated list.
This caused problems down the line: For some reason, Ansible only ever
schedules "flush_handlers" tasks (instead of scheduling any actual tasks from
the playbook) for hosts that are contained in the host lists multiple times.
This probably happens because the host states are stored in a dictionary
indexed by the hostnames, so duplicate hostname would cause the state to be
overwritten by subsequent invocations of … something.
Looks like there are two pattern caches that need to be cleared for this to work- added the second one.
Added integration tests for add_host to prevent future regressions.
* Always cache and return unique list objects, so that if the list
is changed later it does not impact the cached results
* Take additional parameters and the type of the pattern into account
when building the hash string
<crab> jimi|ansible: do you think it should be possible to add both
foo:22 and foo:23 to the inventory?
<jimi|ansible> no
…so we don't want an invitation to FIXME.
Since c8f2483d, ini.py expects to always be passed in a pre-created list
of groups, and can no longer deal sensibly with an empty list; this just
makes that expectation clear.
This fixes a corner case where ini files live in a subdir
of the main inventory directory.
Reproducing the original error:
mkdir -p inventory/ini
cat > inventory/ini/hosts << EOF
[www]
www1
EOF
$ ansible -i inventory/ all -m ping
ERROR! 'all'
(or without the [www] group, it would complain about 'ungrouped')
On Python 2, shlex.split() raises if you pass it a unicode object with
non-ASCII characters in it. The Ansible codebase copes by explicitly
converting the string using to_bytes() before passing it to
shlex.split().
On Python 3, shlex.split() raises ('bytes' object has no attribute 'read')
if you pass a bytes object. Oops.
This commit introduces a new wrapper function, shlex_split, that
transparently performs the to_bytes/to_unicode conversions only on
Python 2.
Currently I've only converted one call site (the one that was causing a
unit test to fail on Python 3). If this approach is deemed suitable,
I'll convert them all.
The earlier distinction was never used; .ipv6_address was always a copy
of .ipv4_address, and the latter was always used to set the remote_addr
field in the PlayContext.
Also uses the canonical ansible_host/ansible_port names when setting the
address and port from variables.
The earlier-recommended "pat1:pat2:pat3[x:y]" notation doesn't work well
with IPv6 addresses, so we recommend ',' as a separator instead. We know
that commas can't occur within a pattern, so we can just split on it.
We still have to accept the "foo:bar" notation because it's so commonly
used, but we issue a deprecation warning for it.
Fixes#12296Closes#12404Closes#12329
This adds a parse_address(pattern) utility function that returns
(host,port), and uses it wherever where we accept IPv4 and IPv6
addresses and hostnames (or host patterns): the inventory parser
the the add_host action plugin.
It also introduces a more extensive set of unit tests that supersedes
the old add_host unit tests (which didn't actually test add_host, but
only the parsing function).
Required some rewiring in inventory code to make sure we're using
the DataLoader class for some data file operations, which makes mocking
them much easier.
Also identified two corner cases not currently handled by the code, related
to inventory variable sources and which one "wins". Also noticed we weren't
properly merging variables from multiple group/host_var file locations
(inventory directory vs. playbook directory locations) so fixed as well.
Replace .iteritems() with six.iteritems() everywhere except in
module_utils (because there's no 'six' on the remote host). And except
in lib/ansible/galaxy/data/metadata_template.j2, because I'm not sure
six is available there.
This commit deprecates the earlier groupname[x-y] syntax in favour of
the inclusive groupname[x:y] syntax. It also makes the subscripting
code simpler and adds explanatory comments.
One problem addressed by the cleanup is that _enumeration_info used to
be called twice, and its results discarded the first time because of the
convoluted control flow.
The possibilities are complicated enough that I didn't want to make
changes without having a complete description of what it actually
accepts/matches. Note that this text documents current behaviour, not
necessarily the behaviour we want. Some of this is undocumented and may
not be intended.
Now we accept IPv6 addresses _with port numbers_ only in the standard
[xxx]:NN notation (though bare IPv6 addresses may be given, as before,
and non-IPv6 addresses may also be placed in square brackets), and any
other host identifiers (IPv4/hostname/host pattern) as before, with an
optional :NN suffix.
The new code parses INI-format inventory files in a single pass using a
well-documented state machine that reports precise errors and eliminates
the duplications and inconsistencies and outright errors in the earlier
three-phase parsing code (e.g. three ways to skip comments). It is also
much easier now to follow what decisions are being taken on the basis of
the parsed data. The comments point out various potential improvements,
particularly in the area of consistent IPv6 handling.
On the ornate marble tombstone of the old code, the following
inscription is one last baffling memento from a bygone age:
- def _before_comment(self, msg):
- ''' what's the part of a string before a comment? '''
- msg = msg.replace("\#","**NOT_A_COMMENT**")
- msg = msg.split("#")[0]
- msg = msg.replace("**NOT_A_COMMENT**","#")
- return msg
first off, we add an oddly slow basic test of 10k item inventory
Before:
```
Ran 229 tests in 13.214s
OK
real 0m13.403s
user 0m12.106s
sys 0m1.155s
```
After:
```
Ran 230 tests in 21.328s
OK
real 0m21.516s
user 0m20.099s
sys 0m1.275s
```
since that seems like a bit long for the test to add to runtime, lets profile
`python -m cProfile -s time ./bin/ansible all -i test/units/inventory_test_data/huge_range --list-hosts`
Before:
```
1272607 function calls (1259689 primitive calls) in 8.497 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 4.393 0.000 4.396 0.000 __init__.py:395(_get_host)
20000 2.695 0.000 2.697 0.000 __init__.py:341(__append_host_to_results)
40369 0.113 0.000 0.113 0.000 {posix.lstat}
50006 0.102 0.000 0.153 0.000 __init__.py:1490(combine_vars)
40008 0.089 0.000 0.202 0.000 __init__.py:1546(_load_vars_from_path)
20195 0.088 0.000 0.088 0.000 {posix.stat}
10011 0.087 0.000 0.087 0.000 {posix.getcwd}
```
The top two lines are promising optimization targets
- populate Inventory's host cache more in _get_host, as we are looping
over all the groups anyways.
- eliminate duplicate check of whether we've already included a host
in the construction around __append_host_to_results we can infer
presence of a host in the results list implies the presence of its
name in the hostnames set, allowing us to only to the less expensive
of the two checks
After:
```
1252610 function calls (1239692 primitive calls) in 1.320 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
40369 0.105 0.000 0.105 0.000 {posix.lstat}
50006 0.094 0.000 0.141 0.000 __init__.py:1490(combine_vars)
40008 0.081 0.000 0.184 0.000 __init__.py:1546(_load_vars_from_path)
10011 0.080 0.000 0.080 0.000 {posix.getcwd}
20195 0.074 0.000 0.074 0.000 {posix.stat}
10002 0.069 0.000 0.261 0.000 __init__.py:1517(load_vars)
```
This function takes a string like 'foo:bar[1:2]:baz[x:y]-quux' and
returns a list of patterns ['foo', 'bar[1:2]', 'baz[x:y]-quux'], i.e.
splits the string on colons that are not part of a range specification.
get_hosts → used externally, not changed
_get_hosts → _evaluate_patterns (takes a list, evaluates ! and &)
__get_hosts → _match_one_pattern (takes one pattern only, ignores !&)
This function takes a string like 'foo:bar[1:2]:baz[x:y]-quux' and
returns a list of patterns ['foo', 'bar[1:2]', 'baz[x:y]-quux'], i.e.
splits the string on colons that are not part of a range specification.
This was used earlier to implement serial, but that's now done using
restrict_to_hosts() (whose docstring is also suitably adjusted here)
and there are no more callers.
This is a slightly different fix than we originally committed, but fixes
the problem in a less invasive way (and I believe it's generally better
that we don't deal with relative paths internally past this point)
Fixes#11789