We want to NOT consider the async task as failed if the result is
not parsed, which was the intent of:
https://github.com/ansible/ansible/pull/16458
However, the logic doesn't actually do that because we default
the 'parsed' value to True. It should default to False so that
we continue waiting, as intended.
Instead of immediately returning a failed code (indicating a break in
the play execution), we internally 'or' that failure code with the result
(now an integer flag instead of a boolean) so that we can properly handle
the rescue/always portions of blocks and still remember that the break
condition was hit.
Fixes#16937
Rather than repeatedly searching for tasks by uuid via iterating over
all known blocks, cache the tasks when they are added to the PlayIterator
so the lookup becomes a simple key check in a dict.
When a task result has an empty results list, the
list should be ignored when determining the results
of `_check_key`. Here the empty list is treated the
same as a non-existent list.
This fixes a bug that manifests itself with squashed
items - namely the task result contains the correct
value for the key, but an empty results list. The
empty results list was treated as zero failures
when deciding which handler to call - so the task
show as a success in the output, but is deemed to
have failed when deciding whether to continue.
This also demonstrates a mismatch between task
result processing and play iteration.
A test is also added for this case, but it would not
have caught the bug - because the bug is really in
the display, and not the success/failure of the
task (visually the test is more accurate).
Fixesansible/ansible-modules-core#4214
This feature changes the scalar value of `serial:` to a list, which
allows users to specify a list of values, so batches can be ramped
up (commonly called "canary" setups):
- hosts: all
serial: [1, 5, 10, "100%"]
tasks:
...
I'm not sure why that would be desirable -- we really want __version__
to come from the controller whereas importing will come from the client
node. If it turns out there was a reason to do that, please be sure to
use an exception handler that catches all exceptions instead of only
catching ImportError:
```
try:
from ansible.release import __version__, __author__
except:
__version__ = [...]
```
Fixes#16523
* fixed lookup search path
added ansible_search_path var that contains the proper list and in order
removed roledir var which was only used by first_found, rest used role_path
added needle function for lookups that mirrors the action plugin one, now
both types of plugins use same pathing.
* added missing os import
* renamed as per feedback
* fixed missing rename in first_found
* also fixed first_found
* fixed import to match new error class
* fixed getattr ref
Since Ansiballz, we no longer need to import basic directly into
a new-style module. Some modules, like the Networking modules, may
import basic in their own module_utils files and the module will import
that specialized module_util file rather than basic.
* Don't treat parsing problems as async task timeout
If there is a problem reading/writing the status file that manifests as
not being able to parse the data, that doesn't mean the task timed out,
it means there was what was likely a tempoarary problem. Move on and
keep polling for success. The only things that should cause the async
status to not be parseable are bugs in the async_runner.
* Add comment explaining not bailing out of loop
* Return different error when result is unparseable
* Remove extraneous else
* Instead of rebuilding the handler list all over the place, we now
compile the handlers at the point the play is post-validated so that
the view of the play in the PlayIterator contains the definitive list
* Assign the dep_chain to the handlers as they're compiling, just as we
do for regular tasks
* Clean up the logic used to find a given handler, which is greatly
simplified by the above changes
Fixes#15418
6eefc11c converted task.loop_control into an object, but while the other
callers were updated to use .loop_var instead of .get('loop_var'), this
site was overlooked.
This can be reproduced by including with loop_control a file that does
set_fact; a simple regression test along these lines is included.
Again, as we're carrying failed/unreachable hosts forward from play to play via
internal structures, we need to remember which ones had previously failed so that
unrelated host failures don't inflate the numbers for a given serial batch in the
PlaybookExecutor causing a premature exit.
Fixes#16364
The listen statement on handlers should have supported a list, however
it was broken in the revision of the pub/sub feature based on the handler
revamp. This patch corrects the bug, so this works again:
- name: some handler
...
listen:
- some target
- another target
Fixes#16378
Due to the fact that roles may be instantiated with different sets of
params (multiple inclusions of the same role or via role dependencies),
simply tracking notified handlers by name does not work. This patch
changes the way we track handler notifications by using the handler
object itself instead of just the name, allowing for multiple internal
instances. Normally this would be bad, but we also modify the way we
search for handlers by first looking at the notifying tasks dependency
chain (ensuring that roles find their own handlers first) and then at
the main list of handlers, using the first match it finds.
This patch also modifies the way we setup the internal list of handlers,
which should allow us to correctly identify if a notified handler exists
more easily.
Fixes#15084
When setuptools installs a python module (as is done via python setup.py
install) It puts the module into a subdirectory of site-packages and
then creates an entry in easy-install.pth to load that directory. This
makes it difficult for Ansiballz to function correctly as the .pth file
overrides the sys.path that the wrapper constructs. Using
sitecustomize.py fixes this because sitecustomize overrides the
directories handled in .pth files.
Fixes#16187
* Fix: create retry_files_save_path if it doesn't exist
Ansible documentation states that retry_files_save_path directory will be
created if it does not already exist. It currently doesn't, so this patch
fixes it :)
* Use makedirs_safe to ensure thread-safe dir creation
@bcoca suggested to use the makedirs_safe helper function :)
This allows the PlaybookExecutor to receive more information regarding
what happened internal to the TaskQueueManager and strategy, to determine
things like whether or not the play iteration should stop.
Fixes#15523
Since we now use the PlayIterator to carry forward failures from previous
play executions, in the event that some hosts which had previously failed
are not in the current inventory we now create a stub state instead of
raising an error.
* Port urls.py to python3
Fixes (largely normalizing byte vs text strings) for python3
* Rework what we do with attributes that aren't set already.
* Comments
With some earlier changes, continuing to forward failed hosts on
to the iterator with each TQM run() call was causing plays with
max_fail_pct set to fail, as hosts which failed in previous plays
were counting those old failures against the % calculation.
Also changed the linear strategy's calculation to use the internal
failed list, rather than the iterator, as this now represents the
hosts failed during the current run only.
As noted in the comment, the TQM may be used for more than one play. As such,
after creating the new PlayIterator object it is necessary to mark any failed
hosts from previous calls to run() as failed in the iterator, so they are
properly skipped during any future calls to run().
Prior to this patch, the retry/until logic would fail any task that
succeeded if it took all of the alloted retries to succeed. This patch
reworks the retry/until logic to make things more simple and clear.
Fixes#15697
Previously the changed code was necessary, however it is now problematic
as we've started using the is_failed() method in other places in the code.
Additional changes at the strategy layer should make this safe to remove
now.
Fixes#15625
* Don't filter hosts remaining based on their failed state. Instead rely
on the PlayIterator to return None/ITERATING_COMPLETE when the host is
failed.
* In the free strategy, make sure we wait outside the host loop for all
pending results to be processed.
* Use the internal _set_failed_state() instead of manually setting things
when a failed child state is hit
Fixes#15623
In `lib/ansible/executor/play_iterator.py`, ansible sets a host's
`_gathered_facts` property to `True` without checking to see if there
are any tasks to be executed. In the event that the entire play is
skipped, `_gathered_facts` will be `True` even though the `setup`
module was never run.
This patch modifies the logic to only set `_gathered_facts` to `True`
when there are tasks to execute.
Closes#15744.
This changeset addresses the issue reported here:
ansible/ansible-modules-core#1765
The yum module (at least) includes its task results as strings, rather than
dicts, and the code this changeset replaces assumed that in that instance the
task was skipped. The updated behaviour assumes that the task has been
skipped only if:
* results exist, and
* all results are dicts that include a truthy skipped value
This was reinitialized every time we forked before so we weren't sharing
the same Locks. It also was not accounting for modules which were
directly invoked by an action plugin instead of going through the
strategy plguins.
* Make ziploader's ansible and ansible.module_utils libraries into
namespace packages.
* Move __version__ and __author__ from ansible/__init__ to
ansible/release.py. This is because namespace packages only load one
__init__.py. If that is not the __init__.py with the author and
version info then those won't be available.
* In ziplaoder, move the version ito ANSIBLE_CONSTANTS.
* Change PluginLoader to properly construct the path to the plugins even
when namespace packages are present.
Previously we were first checking the fail/run state of the child
state for tasks/rescue/always portions of the block. Instead we are now
always recursively iterating over the child state and then evaluating
whether the child state is failed or complete before changing the failed/
run state within the current block.
Fixes#14324
* Move zipcache temp dir creation into the locked section otherwise it
races with other workers.
* Catch IOError and turn it into an AnsibleError. IOErrors can hang
multiprocessng.
Updated python module wrapper explode method to drop 'args' file next to module.
Both execute() and excommunicate() debug methods now pass the module args via file to enable debuggers that are picky about stdin.
Updated unit tests to use a context manager for masking/restoring default streams and argv.
rm _del_ as it might leak memory
renamed to tmp file cleanup
added exception handling when traversing file list, even if one fails try rest
added cleanup to finally to ensure removal in most cases
Ansible when there was a percentage that was calculated to be less than
1.0 would run all hosts as the value for a rolling update.
The error is due to the fact that Python will round a
float that is under 1.0 to 0, which will trigger the case of
0 hosts. The 0 host case tells ansible to run all hosts.
The fix will see if the percentage calculation after int
conversion is 0 and will else to 1 host.
This makes our recursive, ast.parse performance measures as fast as
pre-ziploader baseline.
Since this unittest isn't testing that the returned module data is
correct we don't need to worry about os.rename not having any module
data. Should devise a separate test for the module and caching code
Some debuggers are easier to work with when we do everything in a single
process. This debug option caters to that at the expense of being
different from what Ansible will actually do to invoke a module.
When we document this we should be clear that this shouldn't be used for
general purpose debugging and that some modules may show strange
"errors" when used with this. Those won't be considered real bugs as
it's not how ansible really invokes the modules.
* Run the module as a script from the wrapper instead of executing in the same process.
Fixes cornercases where the module could potentially be executed twice
if we import and then run the main() function without calling sys.exit()
somewhere.
Also fixes problem with concurrent.futures() hanging. Not sure
precisely how this is being triggered but it is related to invoking the
main() function outside of an if __name__ == '__main__' conditional.
* Fix for python-2.6
* Ziploader proof of concept (jimi-c)
* Cleanups to proof of concept ziploader branch:
* python3 compatible base64 encoding
* zipfile compression (still need to enable toggling this off for
systems without zlib support in python)
* Allow non-wildcard imports (still need to make this recusrsive so that
we can have module_utils code that imports other module_utils code.)
* Better tracebacks: module filename is kept and module_utils directory
is kept so that tracebacks show the real filenames that the errors
appear in.
* Make sure we import modules that are used into the module_utils files that they are used in.
* Set ansible version in a more pythonic way for ziploader than we were doing in module replacer
* Make it possible to set the module compression as an inventory var
This may be necessary on systems where python has been compiled without
zlib compression.
* Refactoring of module_common code:
* module replacer only replaces values that make sense for that type of
file (example: don't attempt to replace python imports if we're in
a powershell module).
* Implement configurable shebang support for ziploader wrapper
* Implement client-side constants (for SELINUX_SPECIAL_FS and SYSLOG)
via environment variable.
* Remove strip_comments param as we're never going to use it (ruins line
numbering)
* Don't repeat ourselves about detecting REPLACER
* Add an easy way to debug
* Port test-module to the ziploader-aware modify_module()
* strip comments and blank lines from the wrapper so we send less over the wire.
* Comments cleanup
* Remember to output write the module line itself in powershell modules
* for line in lines strips the newlines so we have to add them back in
* Can be configured in the ansible.cfg for tasks/handlers individually
* If an included filename contains no vars or loops, it will be expanded
in-place as if it were marked as static
* Unit tests exposed a problem where nested blocks did not correctly
hit rescue/always portions of parent blocks
* Cleaned up logic in PlayIterator
* Unfortunately fixing the above exposed a potential problem in the
block integration tests, where a failure in an "always" section may
always lead to a failed state and the termination of execution
beyond that point, so certain parts of the block integration test
were disabled.
main_q is not used anywhere in the codebase.
It is created in TaskQueueManager._initialize_processes, bundled with rslt_q
into TaskQueueManger._workers, later unwrapped in StrategyBase but not used.
This queue is closed in TaskQueueManger._cleanup_processes.
Historically, it is passed as a init parameter into WorkerProcess,
introduced in 62d7956, but this behavior is changed in 120b9a7.
Signed-off-by: 夏恺(Xia Kai) <xiaket@gmail.com>
- now workers passes queue to task_executor so it can send back events per item and on retry attempt
- updated result class to pass along events to strategy
- base strategy updated to forward new events to callback
- callbacks now remove 'items' on final result but process them directly when invoked per item
- new callback method to deal with retry attempt messages (also now obeys nolog)
- updated tests to match new signature of task_executor
fixes#14558fixes#14072
* Fixes bug where the task was not marked as failed if the number of
retries were exceeded (#14461)
* Reorganizing logic to be a bit cleaner, and so retrie messages are
shown before sleeping (which makes way more sense)
Fixes#14461Fixes#14580
* Make sure dep chains are checked recursively for nested blocks
* Fixing iterator is_failed() check to make sure we're not in a
rescue block before returning True
* Use is_failed() to test whether a host should be added to the TQM
failed_hosts list
* Use is_failed() when compiling the list of hosts left to iterate
over in both the linear and free strategies
Fixes#14222
- moved to base cli class to handle centrally and duplicate less code
- now avoids duplication and reiteration of signal handler by reassigning it
- left note on how to do non-graceful in case we add in future
as I won't remember everything i did here and don't want to 'relearn' it.
- adhoc now terminates gracefully
- avoid race condition on terminations by ignoring errors if
worker might have been reaped between checking if active and termination call
- ansible-playbook now properly exits on sigint/term
- adhoc and playbook now give exceptions that we should not normally capture
and rely on top level finally to reap children
- handle systemexit breaks in workers
- added debug to see at which frame we exit
partial fix for #14346
* Fixed a bug in PlayIterator when ITERATING_ALWAYS, where the block
was advanced but the incorrect data structure elements were cleared
* Cleaned up the logic of is_failed() in PlayIterator
* Fixed a bug in the free strategy which had not been updated to use
the base strategy _execute_meta() method
* Stopped strategies from using is_failed() to determine if tasks should
still be fetched for a host
Fixes#14040
The dep chain for roles created during the compile step had bugs, in
which the dep chain was overwriten and the original tasks in the role
were not assigned a dep chain. This lead to problems in determining
whether roles had already run when in a "diamond" structure, and in
some cases roles were not correctly getting variables from parents.
Fixes#14046
* Don't re-use the existing connection if the remote_addr field of
the play context has changed
* When overriding variables in PlayContext (from task/variables),
don't set the same attribute based on a different variable name
if we had already previously set it from another variable name
Fixes#13880
* Relocate the assignment of the host address to the remote_addr field
in the play context, which was only done when the connection was created
(it's now done after the post_validate() is called on the play context)
* Make the assignment of the play context to the connection an else, since
it's not required if the connection is not reused
If this isnt updated, the _connection is reused, and thus has an outdated _play_context
This results in outdated `success_key` and `prompt` causing issues if sudo is run in a loop
Refer to the issue #13763 for more debugging and details
pushed it to use the existing propmpt from display and moved the vars prompt code there also for uniformity
changed vars_prompt to check extra vars vs the empty play.vars to restore 1.9 behaviour
sipmlified the code as it didn't need to check for syntax again (tqm is made none prior based on that)
fixes#13770
Still is a warning as we don't want to repeat it multiple times nor additional callbacks to stop ansible execution.
hopefully we can avoid shipping w/o exceptions in the default/minimal callbacks...
Also added feature that now allows for 'preformated' strings passed to warning
* Added additional methods to the iterator code to assess host failures
while also taking into account the block rescue/always states
* Fixed bugs in the free strategy, where results were not always being
processed after being collected
* Added some prettier printing to the state output from iterator
Fixes#13699
Because the fail_state is potentially non-zero in these block sections,
the prior logic led to included tasks not being inserted at all.
Related issue: #13605
* Saving of the registered variable was occuring after the tests for
changed/failed_when.
* Each of the above fields and until were being post_validated too early,
so variables which were not defined at that time were causing task
failures.
Fixes#13591
* Move self._tqm.load_callbacks() earlier to ensure that v2_on_playbook_start can fire
* Pass the playbook instance to v2_on_playbook_start
* Add a _file_name instance attribute to the playbook
The code isn't sophisticated enough to understand lists and dicts yet.
This mirrors how 1.9.x handled non-string items so its not a regression.
One portion of a fix for #12976
* Properly mark hosts with failures in includes as failed
* Don't send callbacks until we're sure we're done, and also fix how
we increment stats so failures don't show up as ok's
* Fix a bug in the include file logic where a failed include could lead
to an infinite loop in the task iteration logic
Fixes#12933