Some machines have system clocks which can fall behind (for instance,
a host without a CMOS battery like Raspberry Pi). When managing those
machines we have to workaround the fact that the zip format does not
handle file timestamps before 1980. The workaround is to substitute in
the timestamp from the controller instead of from the managed machine.
Fixes#18640
Previous changes addressed a corner case, which unfortunately introduced
another bug. This patch adds a new flag to the host state (did_rescue) which
is set to true when the rescue portion of a block completes. This flag is
then checked in _check_failed_state() when the fail_state != FAILED_NONE.
This lead to the discovery of another bug - current strategies are not advancing
hosts to ITERATING_COMPLETE after doing a peek at the next task, leaving the
host state in the run_state of the final task. To address this, before gathering
the list of failed hosts in StrategyBase.run(), a final pass through the iterator
for all hosts is done to ensure each host is in its final state. This way, no
strategy derived from StrategyBase has to worry about it and it's handled.
Fixes#17983
In order to support legacy plugins, the following two method signatures
are allowed for `CallbackBase.v2_playbook_on_start`:
def v2_playbook_on_start(self):
def v2_playbook_on_start(self, playbook):
Previously, the logic to handle this divergence checked to see if the
callback plugin being called supported an argument named `playbook`
in its `v2_playbook_on_start` method. This was fragile in a few ways:
- if a plugin author did not use the literal `playbook` to name their
method argument, their plugin would not be called correctly
- if a plugin author wrapped their `v2_playbook_on_start` method and
by doing so changed the argspec to no longer expose an argument
with that literal name, their plugin would not be called correctly
In order to continue to support both types of callback for backwards
compatibility while making the call more robust for plugin authors,
the logic can be reversed in order to have a positive check for the old
method signature instead of a positive check for the new one.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
The PlayIterator was written without nested roles in mind, but since
include_role can nest them we need to check to see if we've moved into
a new role which is a child via nesting.
Fixes#18026
When using hostvars to get extra connection-specific vars for connection
plugins, use this raw lookup to avoid prematurely templating all of the
hostvar data (triggering unnecessary lookups).
Fixes#17024
* Pass the absolute path to dirname when assigning basedir
If no path is specified when calling the playbook, os.path.dirname(playbook_path) returns ''
This will cause failure when creating the retry file.
Fixes#17456
* Updated to use os.pathdirname(os.path.abspath())
While doing evil things with action plugins, I hit a code path in which
the mkdir here was failing due to lack of parent dir. Changing this to
makedirs made everything happy. Now, I'd obviously like to understand
why the parent dir exists in some places and not others - but I could
not find anywhere that C.DEFAULT_LOCAL_TMP is ensured to be created.
* Add a new config option to cache the check for controlpersist on the
control machine.
Fixes#15844
* Remove the option and make the behavior the default
* Make the check for controlpersist cache its status per-ssh executable
We couldn't copy to_unicode, to_bytes, to_str into module_utils because
of licensing. So once created it we had two sets of functions that did
the same things but had different implementations. To remedy that, this
change removes the ansible.utils.unicode versions of those functions.
* dynamic role_include
* more fixes for dynamic include roles
* set play yfrom iterator when dynamic
* changes from jimi-c
* avoid modules that break ad hoc
TODO: should really be a config
* adds squashing to objects, which allows them to be squashed down
to a final "view" before post_validate to avoid expensive evaluations
of parent attributes
* attempt #11 to role_include
* fixes from jimi-c
* do not override load_data, move all to load
* removed debugging
* implemented tasks_from parameter, must break cache
* fixed issue with cache and tasks_from
* make resolution of from_tasks prioritize literal
* avoid role dependency dedupe when include_role
* fixed role deps and handlers are now loaded
* simplified code, enabled k=v parsing
used example from jimi-c
* load role defaults for task when include_role
* fixed issue with from_Tasks overriding all subdirs
* corrected priority order of main candidates
* made tasks_from a more generic interface to roles
* fix block inheritance and handler order
* allow vars: clause into included role
* pull vars already processed vs from raw data
* fix from jimi-c blocks i broke
* added back append for dynamic includes
* only allow for basename in from parameter
* fix for docs when no default
* fixed notes
* added include_role to changelog
ran task_executor through python-modernize and then made changes to the
code pointed out by it:
* Most places where we looped through dict.keys() changed to
for key in dict:
Using keys() in python2 creates a list() of keys. For iterating, we
can iterate over the dict itself and we'll be handed back each key.
In python3, doing it this way does not create a new list and thus is
more memory efficient.
* In one place, use:
for key in list(dict.keys()):
because we're deleting elements from the dictionary inside of the
loop. So we really do need to iterate over a separate list of the
keys to avoid modifying the dictionary that we're iterating over.
(Fixes Python3 bug)
* In one place, change the order of an if-elif-else tree so that the
most frequent cases are evaluated first. (Optimization)
The calculation for max_fail_percentage was moved into the linear
strategy a while back, and works better there in the stategy layer
rather than at the PBE layer. This patch removes it from the PBE layer
and tweaks the logic controlling whether or not the next batch is run.
Fixes#15954
The flag new_pb_basedir is not being utilized in Inventory._get_hostgroup_vars,
leading to the situation where an inventory with no playbook basedir set will
read host/group vars from the $CWD, regardless of the inventory and/or playbook
relative location. This patch corrects that by not using the playbook basedir
if it is unset (None).
This patch also corrects a bug in which the VariableManager would accumulate
host/group vars files, which could lead to incorrect vars files being used when
playbooks are run from different directories containing their own group/host vars
directories.
Fixes#16953
We want to NOT consider the async task as failed if the result is
not parsed, which was the intent of:
https://github.com/ansible/ansible/pull/16458
However, the logic doesn't actually do that because we default
the 'parsed' value to True. It should default to False so that
we continue waiting, as intended.
Instead of immediately returning a failed code (indicating a break in
the play execution), we internally 'or' that failure code with the result
(now an integer flag instead of a boolean) so that we can properly handle
the rescue/always portions of blocks and still remember that the break
condition was hit.
Fixes#16937
Rather than repeatedly searching for tasks by uuid via iterating over
all known blocks, cache the tasks when they are added to the PlayIterator
so the lookup becomes a simple key check in a dict.
When a task result has an empty results list, the
list should be ignored when determining the results
of `_check_key`. Here the empty list is treated the
same as a non-existent list.
This fixes a bug that manifests itself with squashed
items - namely the task result contains the correct
value for the key, but an empty results list. The
empty results list was treated as zero failures
when deciding which handler to call - so the task
show as a success in the output, but is deemed to
have failed when deciding whether to continue.
This also demonstrates a mismatch between task
result processing and play iteration.
A test is also added for this case, but it would not
have caught the bug - because the bug is really in
the display, and not the success/failure of the
task (visually the test is more accurate).
Fixesansible/ansible-modules-core#4214
This feature changes the scalar value of `serial:` to a list, which
allows users to specify a list of values, so batches can be ramped
up (commonly called "canary" setups):
- hosts: all
serial: [1, 5, 10, "100%"]
tasks:
...
I'm not sure why that would be desirable -- we really want __version__
to come from the controller whereas importing will come from the client
node. If it turns out there was a reason to do that, please be sure to
use an exception handler that catches all exceptions instead of only
catching ImportError:
```
try:
from ansible.release import __version__, __author__
except:
__version__ = [...]
```
Fixes#16523
* fixed lookup search path
added ansible_search_path var that contains the proper list and in order
removed roledir var which was only used by first_found, rest used role_path
added needle function for lookups that mirrors the action plugin one, now
both types of plugins use same pathing.
* added missing os import
* renamed as per feedback
* fixed missing rename in first_found
* also fixed first_found
* fixed import to match new error class
* fixed getattr ref
Since Ansiballz, we no longer need to import basic directly into
a new-style module. Some modules, like the Networking modules, may
import basic in their own module_utils files and the module will import
that specialized module_util file rather than basic.
* Don't treat parsing problems as async task timeout
If there is a problem reading/writing the status file that manifests as
not being able to parse the data, that doesn't mean the task timed out,
it means there was what was likely a tempoarary problem. Move on and
keep polling for success. The only things that should cause the async
status to not be parseable are bugs in the async_runner.
* Add comment explaining not bailing out of loop
* Return different error when result is unparseable
* Remove extraneous else
* Instead of rebuilding the handler list all over the place, we now
compile the handlers at the point the play is post-validated so that
the view of the play in the PlayIterator contains the definitive list
* Assign the dep_chain to the handlers as they're compiling, just as we
do for regular tasks
* Clean up the logic used to find a given handler, which is greatly
simplified by the above changes
Fixes#15418
6eefc11c converted task.loop_control into an object, but while the other
callers were updated to use .loop_var instead of .get('loop_var'), this
site was overlooked.
This can be reproduced by including with loop_control a file that does
set_fact; a simple regression test along these lines is included.
Again, as we're carrying failed/unreachable hosts forward from play to play via
internal structures, we need to remember which ones had previously failed so that
unrelated host failures don't inflate the numbers for a given serial batch in the
PlaybookExecutor causing a premature exit.
Fixes#16364
The listen statement on handlers should have supported a list, however
it was broken in the revision of the pub/sub feature based on the handler
revamp. This patch corrects the bug, so this works again:
- name: some handler
...
listen:
- some target
- another target
Fixes#16378
Due to the fact that roles may be instantiated with different sets of
params (multiple inclusions of the same role or via role dependencies),
simply tracking notified handlers by name does not work. This patch
changes the way we track handler notifications by using the handler
object itself instead of just the name, allowing for multiple internal
instances. Normally this would be bad, but we also modify the way we
search for handlers by first looking at the notifying tasks dependency
chain (ensuring that roles find their own handlers first) and then at
the main list of handlers, using the first match it finds.
This patch also modifies the way we setup the internal list of handlers,
which should allow us to correctly identify if a notified handler exists
more easily.
Fixes#15084
When setuptools installs a python module (as is done via python setup.py
install) It puts the module into a subdirectory of site-packages and
then creates an entry in easy-install.pth to load that directory. This
makes it difficult for Ansiballz to function correctly as the .pth file
overrides the sys.path that the wrapper constructs. Using
sitecustomize.py fixes this because sitecustomize overrides the
directories handled in .pth files.
Fixes#16187
* Fix: create retry_files_save_path if it doesn't exist
Ansible documentation states that retry_files_save_path directory will be
created if it does not already exist. It currently doesn't, so this patch
fixes it :)
* Use makedirs_safe to ensure thread-safe dir creation
@bcoca suggested to use the makedirs_safe helper function :)
This allows the PlaybookExecutor to receive more information regarding
what happened internal to the TaskQueueManager and strategy, to determine
things like whether or not the play iteration should stop.
Fixes#15523
Since we now use the PlayIterator to carry forward failures from previous
play executions, in the event that some hosts which had previously failed
are not in the current inventory we now create a stub state instead of
raising an error.
* Port urls.py to python3
Fixes (largely normalizing byte vs text strings) for python3
* Rework what we do with attributes that aren't set already.
* Comments
With some earlier changes, continuing to forward failed hosts on
to the iterator with each TQM run() call was causing plays with
max_fail_pct set to fail, as hosts which failed in previous plays
were counting those old failures against the % calculation.
Also changed the linear strategy's calculation to use the internal
failed list, rather than the iterator, as this now represents the
hosts failed during the current run only.
As noted in the comment, the TQM may be used for more than one play. As such,
after creating the new PlayIterator object it is necessary to mark any failed
hosts from previous calls to run() as failed in the iterator, so they are
properly skipped during any future calls to run().
Prior to this patch, the retry/until logic would fail any task that
succeeded if it took all of the alloted retries to succeed. This patch
reworks the retry/until logic to make things more simple and clear.
Fixes#15697
Previously the changed code was necessary, however it is now problematic
as we've started using the is_failed() method in other places in the code.
Additional changes at the strategy layer should make this safe to remove
now.
Fixes#15625
* Don't filter hosts remaining based on their failed state. Instead rely
on the PlayIterator to return None/ITERATING_COMPLETE when the host is
failed.
* In the free strategy, make sure we wait outside the host loop for all
pending results to be processed.
* Use the internal _set_failed_state() instead of manually setting things
when a failed child state is hit
Fixes#15623
In `lib/ansible/executor/play_iterator.py`, ansible sets a host's
`_gathered_facts` property to `True` without checking to see if there
are any tasks to be executed. In the event that the entire play is
skipped, `_gathered_facts` will be `True` even though the `setup`
module was never run.
This patch modifies the logic to only set `_gathered_facts` to `True`
when there are tasks to execute.
Closes#15744.
This changeset addresses the issue reported here:
ansible/ansible-modules-core#1765
The yum module (at least) includes its task results as strings, rather than
dicts, and the code this changeset replaces assumed that in that instance the
task was skipped. The updated behaviour assumes that the task has been
skipped only if:
* results exist, and
* all results are dicts that include a truthy skipped value
This was reinitialized every time we forked before so we weren't sharing
the same Locks. It also was not accounting for modules which were
directly invoked by an action plugin instead of going through the
strategy plguins.
* Make ziploader's ansible and ansible.module_utils libraries into
namespace packages.
* Move __version__ and __author__ from ansible/__init__ to
ansible/release.py. This is because namespace packages only load one
__init__.py. If that is not the __init__.py with the author and
version info then those won't be available.
* In ziplaoder, move the version ito ANSIBLE_CONSTANTS.
* Change PluginLoader to properly construct the path to the plugins even
when namespace packages are present.
Previously we were first checking the fail/run state of the child
state for tasks/rescue/always portions of the block. Instead we are now
always recursively iterating over the child state and then evaluating
whether the child state is failed or complete before changing the failed/
run state within the current block.
Fixes#14324
* Move zipcache temp dir creation into the locked section otherwise it
races with other workers.
* Catch IOError and turn it into an AnsibleError. IOErrors can hang
multiprocessng.
Updated python module wrapper explode method to drop 'args' file next to module.
Both execute() and excommunicate() debug methods now pass the module args via file to enable debuggers that are picky about stdin.
Updated unit tests to use a context manager for masking/restoring default streams and argv.
rm _del_ as it might leak memory
renamed to tmp file cleanup
added exception handling when traversing file list, even if one fails try rest
added cleanup to finally to ensure removal in most cases
Ansible when there was a percentage that was calculated to be less than
1.0 would run all hosts as the value for a rolling update.
The error is due to the fact that Python will round a
float that is under 1.0 to 0, which will trigger the case of
0 hosts. The 0 host case tells ansible to run all hosts.
The fix will see if the percentage calculation after int
conversion is 0 and will else to 1 host.
This makes our recursive, ast.parse performance measures as fast as
pre-ziploader baseline.
Since this unittest isn't testing that the returned module data is
correct we don't need to worry about os.rename not having any module
data. Should devise a separate test for the module and caching code
Some debuggers are easier to work with when we do everything in a single
process. This debug option caters to that at the expense of being
different from what Ansible will actually do to invoke a module.
When we document this we should be clear that this shouldn't be used for
general purpose debugging and that some modules may show strange
"errors" when used with this. Those won't be considered real bugs as
it's not how ansible really invokes the modules.
* Run the module as a script from the wrapper instead of executing in the same process.
Fixes cornercases where the module could potentially be executed twice
if we import and then run the main() function without calling sys.exit()
somewhere.
Also fixes problem with concurrent.futures() hanging. Not sure
precisely how this is being triggered but it is related to invoking the
main() function outside of an if __name__ == '__main__' conditional.
* Fix for python-2.6
* Ziploader proof of concept (jimi-c)
* Cleanups to proof of concept ziploader branch:
* python3 compatible base64 encoding
* zipfile compression (still need to enable toggling this off for
systems without zlib support in python)
* Allow non-wildcard imports (still need to make this recusrsive so that
we can have module_utils code that imports other module_utils code.)
* Better tracebacks: module filename is kept and module_utils directory
is kept so that tracebacks show the real filenames that the errors
appear in.
* Make sure we import modules that are used into the module_utils files that they are used in.
* Set ansible version in a more pythonic way for ziploader than we were doing in module replacer
* Make it possible to set the module compression as an inventory var
This may be necessary on systems where python has been compiled without
zlib compression.
* Refactoring of module_common code:
* module replacer only replaces values that make sense for that type of
file (example: don't attempt to replace python imports if we're in
a powershell module).
* Implement configurable shebang support for ziploader wrapper
* Implement client-side constants (for SELINUX_SPECIAL_FS and SYSLOG)
via environment variable.
* Remove strip_comments param as we're never going to use it (ruins line
numbering)
* Don't repeat ourselves about detecting REPLACER
* Add an easy way to debug
* Port test-module to the ziploader-aware modify_module()
* strip comments and blank lines from the wrapper so we send less over the wire.
* Comments cleanup
* Remember to output write the module line itself in powershell modules
* for line in lines strips the newlines so we have to add them back in
* Can be configured in the ansible.cfg for tasks/handlers individually
* If an included filename contains no vars or loops, it will be expanded
in-place as if it were marked as static
* Unit tests exposed a problem where nested blocks did not correctly
hit rescue/always portions of parent blocks
* Cleaned up logic in PlayIterator
* Unfortunately fixing the above exposed a potential problem in the
block integration tests, where a failure in an "always" section may
always lead to a failed state and the termination of execution
beyond that point, so certain parts of the block integration test
were disabled.
main_q is not used anywhere in the codebase.
It is created in TaskQueueManager._initialize_processes, bundled with rslt_q
into TaskQueueManger._workers, later unwrapped in StrategyBase but not used.
This queue is closed in TaskQueueManger._cleanup_processes.
Historically, it is passed as a init parameter into WorkerProcess,
introduced in 62d7956, but this behavior is changed in 120b9a7.
Signed-off-by: 夏恺(Xia Kai) <xiaket@gmail.com>
- now workers passes queue to task_executor so it can send back events per item and on retry attempt
- updated result class to pass along events to strategy
- base strategy updated to forward new events to callback
- callbacks now remove 'items' on final result but process them directly when invoked per item
- new callback method to deal with retry attempt messages (also now obeys nolog)
- updated tests to match new signature of task_executor
fixes#14558fixes#14072
* Fixes bug where the task was not marked as failed if the number of
retries were exceeded (#14461)
* Reorganizing logic to be a bit cleaner, and so retrie messages are
shown before sleeping (which makes way more sense)
Fixes#14461Fixes#14580
* Make sure dep chains are checked recursively for nested blocks
* Fixing iterator is_failed() check to make sure we're not in a
rescue block before returning True
* Use is_failed() to test whether a host should be added to the TQM
failed_hosts list
* Use is_failed() when compiling the list of hosts left to iterate
over in both the linear and free strategies
Fixes#14222
- moved to base cli class to handle centrally and duplicate less code
- now avoids duplication and reiteration of signal handler by reassigning it
- left note on how to do non-graceful in case we add in future
as I won't remember everything i did here and don't want to 'relearn' it.
- adhoc now terminates gracefully
- avoid race condition on terminations by ignoring errors if
worker might have been reaped between checking if active and termination call
- ansible-playbook now properly exits on sigint/term
- adhoc and playbook now give exceptions that we should not normally capture
and rely on top level finally to reap children
- handle systemexit breaks in workers
- added debug to see at which frame we exit
partial fix for #14346
* Fixed a bug in PlayIterator when ITERATING_ALWAYS, where the block
was advanced but the incorrect data structure elements were cleared
* Cleaned up the logic of is_failed() in PlayIterator
* Fixed a bug in the free strategy which had not been updated to use
the base strategy _execute_meta() method
* Stopped strategies from using is_failed() to determine if tasks should
still be fetched for a host
Fixes#14040