Browse Source

Merge pull request #3778 from lhuard1A/rh_subscription_resilient

Automatic merge from submit-queue

Make RH subscription more resilient to temporary failures

subscription-manager can sometimes fail because of server side errors.
Manually replaying the command usually works.
So, let’s make openshift-ansible more resilient to temporary failures of
subscription-manager by retrying the failed commands with a maximum of
3 retries.

Here is an example of such sporadic errors:
```
TASK [rhel_subscribe : Retrieve the OpenShift Pool ID] *************************
ok: [lenaic-node-compute-c96e7]
ok: [lenaic-master-bbe09]
ok: [lenaic-node-compute-2976a]
fatal: [lenaic-node-infra-47ba5]: FAILED! => {"changed": false, "cmd": ["subscription-manager", "list", "--available", "--matches=Red Hat OpenShift Container Platform, Premium*", "--pool-only"], "delta": "0:00:07.152650", "end": "2017-04-04 11:24:59.729405", "failed": true, "rc": 70, "start": "2017-04-04 11:24:52.576755", "stderr": "Unable to verify server's identity: (104, 'Connection reset by peer')", "stdout": "", "stdout_lines": [], "warnings": []}

TASK [rhel_subscribe : Determine if OpenShift Pool Already Attached] ***********
skipping: [lenaic-master-bbe09]
skipping: [lenaic-node-compute-2976a]
skipping: [lenaic-node-compute-c96e7]

TASK [rhel_subscribe : fail] ***************************************************
skipping: [lenaic-node-compute-2976a]
skipping: [lenaic-master-bbe09]
skipping: [lenaic-node-compute-c96e7]

TASK [rhel_subscribe : Attach to OpenShift Pool] *******************************
fatal: [lenaic-node-compute-c96e7]: FAILED! => {"changed": true, "cmd": ["subscription-manager", "subscribe", "--pool", "8a85f9814ff0134a014ff43b44095513"], "delta": "0:00:21.421300", "end": "2017-04-04 11:25:20.655873", "failed": true, "rc": 70, "start": "2017-04-04 11:24:59.234573", "stderr": "Unable to verify server's identity: (104, 'Connection reset by peer')", "stdout": "Successfully attached a subscription for: Red Hat OpenShift Container Platform, Premium (1-2 Sockets)", "stdout_lines": ["Successfully attached a subscription for: Red Hat OpenShift Container Platform, Premium (1-2 Sockets)"], "warnings": []}
changed: [lenaic-master-bbe09]
changed: [lenaic-node-compute-2976a]
```

In this example, subscription-manager was failing on some nodes, but not all. Retrying on the failed nodes would have avoided to abandon those nodes.
OpenShift Merge Robot 7 years ago
parent
commit
9934365323
1 changed files with 6 additions and 0 deletions
  1. 6 0
      roles/rhel_subscribe/tasks/main.yml

+ 6 - 0
roles/rhel_subscribe/tasks/main.yml

@@ -41,15 +41,19 @@
   redhat_subscription:
     username: "{{ rhel_subscription_user }}"
     password: "{{ rhel_subscription_pass }}"
+  register: rh_subscription
+  until: rh_subscription | succeeded
 
 - name: Retrieve the OpenShift Pool ID
   command: subscription-manager list --available --matches="{{ rhel_subscription_pool }}" --pool-only
   register: openshift_pool_id
+  until: openshift_pool_id | succeeded
   changed_when: False
 
 - name: Determine if OpenShift Pool Already Attached
   command: subscription-manager list --consumed --matches="{{ rhel_subscription_pool }}" --pool-only
   register: openshift_pool_attached
+  until: openshift_pool_attached | succeeded
   changed_when: False
   when: openshift_pool_id.stdout == ''
 
@@ -59,6 +63,8 @@
 
 - name: Attach to OpenShift Pool
   command: subscription-manager subscribe --pool {{ openshift_pool_id.stdout_lines[0] }}
+  register: subscribe_pool
+  until: subscribe_pool | succeeded
   when: openshift_pool_id.stdout != ''
 
 - include: enterprise.yml