Skip to content

Commit 7065e81

Browse files
authored
Merge pull request #3261 from Azure/release-2.12.0.0
Release 2.12.0.0 to master
2 parents acd2f73 + b79ceb8 commit 7065e81

File tree

239 files changed

+8239
-3048
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

239 files changed

+8239
-3048
lines changed

.github/PULL_REQUEST_TEMPLATE.md

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This will expedite the process of getting your pull request merged and avoid ext
1212
---
1313

1414
### PR information
15+
- [ ] Ensure development PR is based on the `develop` branch.
1516
- [ ] The title of the PR is clear and informative.
1617
- [ ] There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md).
1718
- [ ] If applicable, the PR references the bug/issue that it fixes in the description.

.github/workflows/ci_pr.yml

+63-27
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ on:
99

1010
jobs:
1111
test-python-2_6-and-3_4-versions:
12-
12+
1313
strategy:
1414
fail-fast: false
1515
matrix:
1616
include:
17-
- python-version: 2.6
18-
- python-version: 3.4
17+
- python-version: "2.6"
18+
- python-version: "3.4"
1919

2020
name: "Python ${{ matrix.python-version }} Unit Tests"
2121
runs-on: ubuntu-20.04
@@ -43,7 +43,7 @@ jobs:
4343
4444
- name: Test with nosetests
4545
run: |
46-
if [[ ${{ matrix.python-version }} == 2.6 ]]; then
46+
if [[ ${{ matrix.python-version }} == "2.6" ]]; then
4747
source /home/waagent/virtualenv/python2.6.9/bin/activate
4848
else
4949
source /home/waagent/virtualenv/python3.4.8/bin/activate
@@ -87,30 +87,23 @@ jobs:
8787
fail-fast: false
8888
matrix:
8989
include:
90-
- python-version: 3.5
91-
PYLINTOPTS: "--rcfile=ci/3.6.pylintrc --ignore=tests_e2e,makepkg.py"
92-
93-
- python-version: 3.6
94-
PYLINTOPTS: "--rcfile=ci/3.6.pylintrc --ignore=tests_e2e"
95-
96-
- python-version: 3.7
97-
PYLINTOPTS: "--rcfile=ci/3.6.pylintrc --ignore=tests_e2e"
98-
99-
- python-version: 3.8
100-
PYLINTOPTS: "--rcfile=ci/3.6.pylintrc --ignore=tests_e2e"
101-
102-
- python-version: 3.9
103-
PYLINTOPTS: "--rcfile=ci/3.6.pylintrc"
90+
- python-version: "3.5"
91+
# workaround found in https://github.com/actions/setup-python/issues/866
92+
# for issue "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:728)" on Python 3.5
93+
pip_trusted_host: "pypi.python.org pypi.org files.pythonhosted.org"
94+
- python-version: "3.6"
95+
- python-version: "3.7"
96+
- python-version: "3.8"
97+
- python-version: "3.9"
10498
additional-nose-opts: "--with-coverage --cover-erase --cover-inclusive --cover-branches --cover-package=azurelinuxagent"
99+
- python-version: "3.10"
100+
- python-version: "3.11"
105101

106102
name: "Python ${{ matrix.python-version }} Unit Tests"
107103
runs-on: ubuntu-20.04
108104

109105
env:
110-
PYLINTOPTS: ${{ matrix.PYLINTOPTS }}
111-
PYLINTFILES: "azurelinuxagent setup.py makepkg.py tests tests_e2e"
112106
NOSEOPTS: "--with-timer ${{ matrix.additional-nose-opts }}"
113-
PYTHON_VERSION: ${{ matrix.python-version }}
114107

115108
steps:
116109

@@ -121,26 +114,69 @@ jobs:
121114
uses: actions/setup-python@v4
122115
with:
123116
python-version: ${{ matrix.python-version }}
117+
env:
118+
PIP_TRUSTED_HOST: ${{ matrix.pip_trusted_host }}
124119

125120
- name: Install dependencies
126121
id: install-dependencies
127122
run: |
128123
sudo env "PATH=$PATH" python -m pip install --upgrade pip
129124
sudo env "PATH=$PATH" pip install -r requirements.txt
130125
sudo env "PATH=$PATH" pip install -r test-requirements.txt
126+
sudo env "PATH=$PATH" pip install --upgrade pylint
131127
132128
- name: Run pylint
133129
run: |
134-
pylint $PYLINTOPTS --jobs=0 $PYLINTFILES
130+
#
131+
# List of files/directories to be checked by pylint.
132+
# The end-to-end tests run only on Python 3.9 and we lint them only on that version.
133+
#
134+
PYLINT_FILES="azurelinuxagent setup.py makepkg.py tests"
135+
if [[ "${{ matrix.python-version }}" == "3.9" ]]; then
136+
PYLINT_FILES="$PYLINT_FILES tests_e2e"
137+
fi
135138
136-
- name: Test with nosetests
139+
#
140+
# Command-line options for pylint.
141+
# * "unused-private-member" is not implemented on 3.5 and will produce "E0012: Bad option value 'unused-private-member' (bad-option-value)"
142+
# so we suppress "bad-option-value".
143+
# * 3.9 will produce "no-member" for several properties/methods that are added to the mocks used by the unit tests (e.g
144+
# "E1101: Instance of 'WireProtocol' has no 'aggregate_status' member") so we suppress that warning.
145+
# * On 3.9 pylint crashes when parsing azurelinuxagent/daemon/main.py (see https://github.com/pylint-dev/pylint/issues/9473), so we ignore it.
146+
# * 'no-self-use' ("R0201: Method could be a function") was moved to an optional extension on 3.8 and is no longer used by default. It needs
147+
# to be suppressed for previous versions (3.0-3.7), though.
148+
# * 'contextmanager-generator-missing-cleanup' are false positives if yield is used inside an if-else block for contextmanager generator functions.
149+
# (https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/contextmanager-generator-missing-cleanup.html).
150+
# This is not implemented on versions (3.0-3.7) Bad option value 'contextmanager-generator-missing-cleanup' (bad-option-value)
151+
# * 3.9-3.11 will produce "too-many-positional-arguments" for several methods that are having more than 5 args, so we suppress that warning.
152+
# (R0917: Too many positional arguments (8/5) (too-many-positional-arguments))
153+
PYLINT_OPTIONS="--rcfile=ci/pylintrc --jobs=0"
154+
if [[ "${{ matrix.python-version }}" == "3.9" ]]; then
155+
PYLINT_OPTIONS="$PYLINT_OPTIONS --disable=no-member,too-many-positional-arguments --ignore=main.py"
156+
fi
157+
if [[ "${{ matrix.python-version }}" =~ ^3\.(10|11)$ ]]; then
158+
PYLINT_OPTIONS="$PYLINT_OPTIONS --disable=too-many-positional-arguments"
159+
fi
160+
if [[ "${{ matrix.python-version }}" =~ ^3\.[0-7]$ ]]; then
161+
PYLINT_OPTIONS="$PYLINT_OPTIONS --disable=no-self-use,bad-option-value"
162+
fi
163+
164+
echo "PYLINT_OPTIONS: $PYLINT_OPTIONS"
165+
echo "PYLINT_FILES: $PYLINT_FILES"
166+
167+
pylint $PYLINT_OPTIONS $PYLINT_FILES
168+
169+
- name: Execute Unit Tests
137170
if: success() || (failure() && steps.install-dependencies.outcome == 'success')
138171
run: |
139-
./ci/nosetests.sh
140-
exit $?
172+
if [[ "${{ matrix.python-version }}" =~ ^3\.[1-9][0-9]+$ ]]; then
173+
./ci/pytest.sh
174+
else
175+
./ci/nosetests.sh
176+
fi
141177
142178
- name: Compile Coverage
143-
if: matrix.python-version == 3.9
179+
if: matrix.python-version == '3.9'
144180
run: |
145181
echo looking for coverage files :
146182
ls -alh | grep -i coverage
@@ -149,7 +185,7 @@ jobs:
149185
sudo env "PATH=$PATH" coverage report
150186
151187
- name: Upload Coverage
152-
if: matrix.python-version == 3.9
188+
if: matrix.python-version == '3.9'
153189
uses: codecov/codecov-action@v3
154190
with:
155191
file: ./coverage.xml

README.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ Waagent depends on some system packages in order to function properly:
100100
* Filesystem utilities: sfdisk, fdisk, mkfs, parted
101101
* Password tools: chpasswd, sudo
102102
* Text processing tools: sed, grep
103-
* Network tools: ip-route
103+
* Network tools: ip-route, iptables
104104

105105
## Installation
106106

@@ -568,6 +568,13 @@ OpenSSL commands. This signals OpenSSL to use any installed FIPS-compliant libra
568568
Note that the agent itself has no FIPS-specific code. _If no FIPS-compliant certificates are
569569
installed, then enabling this option will cause all OpenSSL commands to fail._
570570

571+
#### __OS.EnableFirewall__
572+
573+
_Type: Boolean_
574+
_Default: n (set to 'y' in waagent.conf)_
575+
576+
Creates firewall rules to allow communication with the VM Host only by the Agent.
577+
571578
#### __OS.MonitorDhcpClientRestartPeriod__
572579

573580
_Type: Integer_

azurelinuxagent/agent.py

+47-28
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,18 @@
2323

2424
from __future__ import print_function
2525

26+
import json
2627
import os
2728
import re
2829
import subprocess
2930
import sys
3031
import threading
32+
33+
from azurelinuxagent.common.exception import CGroupsException
3134
from azurelinuxagent.ga import logcollector, cgroupconfigurator
32-
from azurelinuxagent.ga.cgroup import AGENT_LOG_COLLECTOR, CpuCgroup, MemoryCgroup
33-
from azurelinuxagent.ga.cgroupapi import SystemdCgroupsApi
35+
from azurelinuxagent.ga.cgroupcontroller import AGENT_LOG_COLLECTOR
36+
from azurelinuxagent.ga.cpucontroller import _CpuController
37+
from azurelinuxagent.ga.cgroupapi import get_cgroup_api, log_cgroup_warning, InvalidCgroupMountpointException
3438

3539
import azurelinuxagent.common.conf as conf
3640
import azurelinuxagent.common.event as event
@@ -131,7 +135,7 @@ def daemon(self):
131135
"""
132136
set_daemon_version(AGENT_VERSION)
133137
logger.set_prefix("Daemon")
134-
threading.current_thread().setName("Daemon")
138+
threading.current_thread().name = "Daemon"
135139
child_args = None \
136140
if self.conf_file_path is None \
137141
else "-configuration-path:{0}".format(self.conf_file_path)
@@ -171,7 +175,7 @@ def run_exthandlers(self, debug=False):
171175
Run the update and extension handler
172176
"""
173177
logger.set_prefix("ExtHandler")
174-
threading.current_thread().setName("ExtHandler")
178+
threading.current_thread().name = "ExtHandler"
175179

176180
#
177181
# Agents < 2.2.53 used to echo the log to the console. Since the extension handler could have been started by
@@ -206,42 +210,57 @@ def collect_logs(self, is_full_mode):
206210

207211
# Check the cgroups unit
208212
log_collector_monitor = None
209-
cgroups_api = SystemdCgroupsApi()
210-
cpu_cgroup_path, memory_cgroup_path = cgroups_api.get_process_cgroup_paths("self")
213+
tracked_controllers = []
211214
if CollectLogsHandler.is_enabled_monitor_cgroups_check():
212-
cpu_slice_matches = (cgroupconfigurator.LOGCOLLECTOR_SLICE in cpu_cgroup_path)
213-
memory_slice_matches = (cgroupconfigurator.LOGCOLLECTOR_SLICE in memory_cgroup_path)
215+
try:
216+
cgroup_api = get_cgroup_api()
217+
except InvalidCgroupMountpointException as e:
218+
log_cgroup_warning("The agent does not support cgroups if the default systemd mountpoint is not being used: {0}".format(ustr(e)), send_event=True)
219+
sys.exit(logcollector.INVALID_CGROUPS_ERRCODE)
220+
except CGroupsException as e:
221+
log_cgroup_warning("Unable to determine which cgroup version to use: {0}".format(ustr(e)), send_event=True)
222+
sys.exit(logcollector.INVALID_CGROUPS_ERRCODE)
214223

215-
if not cpu_slice_matches or not memory_slice_matches:
216-
logger.info("The Log Collector process is not in the proper cgroups:")
217-
if not cpu_slice_matches:
218-
logger.info("\tunexpected cpu slice")
219-
if not memory_slice_matches:
220-
logger.info("\tunexpected memory slice")
224+
log_collector_cgroup = cgroup_api.get_process_cgroup(process_id="self", cgroup_name=AGENT_LOG_COLLECTOR)
225+
tracked_controllers = log_collector_cgroup.get_controllers()
221226

227+
if len(tracked_controllers) != len(log_collector_cgroup.get_supported_controller_names()):
228+
log_cgroup_warning("At least one required controller is missing. The following controllers are required for the log collector to run: {0}".format(log_collector_cgroup.get_supported_controller_names()))
222229
sys.exit(logcollector.INVALID_CGROUPS_ERRCODE)
223230

224-
def initialize_cgroups_tracking(cpu_cgroup_path, memory_cgroup_path):
225-
cpu_cgroup = CpuCgroup(AGENT_LOG_COLLECTOR, cpu_cgroup_path)
226-
msg = "Started tracking cpu cgroup {0}".format(cpu_cgroup)
227-
logger.info(msg)
228-
cpu_cgroup.initialize_cpu_usage()
229-
memory_cgroup = MemoryCgroup(AGENT_LOG_COLLECTOR, memory_cgroup_path)
230-
msg = "Started tracking memory cgroup {0}".format(memory_cgroup)
231-
logger.info(msg)
232-
return [cpu_cgroup, memory_cgroup]
231+
if not log_collector_cgroup.check_in_expected_slice(cgroupconfigurator.LOGCOLLECTOR_SLICE):
232+
log_cgroup_warning("The Log Collector process is not in the proper cgroups", send_event=False)
233+
sys.exit(logcollector.INVALID_CGROUPS_ERRCODE)
233234

234235
try:
235236
log_collector = LogCollector(is_full_mode)
236-
# Running log collector resource(CPU, Memory) monitoring only if agent starts the log collector.
237+
# Running log collector resource monitoring only if agent starts the log collector.
237238
# If Log collector start by any other means, then it will not be monitored.
238239
if CollectLogsHandler.is_enabled_monitor_cgroups_check():
239-
tracked_cgroups = initialize_cgroups_tracking(cpu_cgroup_path, memory_cgroup_path)
240-
log_collector_monitor = get_log_collector_monitor_handler(tracked_cgroups)
240+
for controller in tracked_controllers:
241+
if isinstance(controller, _CpuController):
242+
controller.initialize_cpu_usage()
243+
break
244+
log_collector_monitor = get_log_collector_monitor_handler(tracked_controllers)
241245
log_collector_monitor.run()
242-
archive = log_collector.collect_logs_and_get_archive()
246+
247+
archive, total_uncompressed_size = log_collector.collect_logs_and_get_archive()
243248
logger.info("Log collection successfully completed. Archive can be found at {0} "
244249
"and detailed log output can be found at {1}".format(archive, OUTPUT_RESULTS_FILE_PATH))
250+
251+
if log_collector_monitor is not None:
252+
log_collector_monitor.stop()
253+
try:
254+
metrics_summary = log_collector_monitor.get_max_recorded_metrics()
255+
metrics_summary['Total Uncompressed File Size (B)'] = total_uncompressed_size
256+
msg = json.dumps(metrics_summary)
257+
logger.info(msg)
258+
event.add_event(op=event.WALAEventOperation.LogCollection, message=msg, log_event=False)
259+
except Exception as e:
260+
msg = "An error occurred while reporting log collector resource usage summary: {0}".format(ustr(e))
261+
logger.warn(msg)
262+
event.add_event(op=event.WALAEventOperation.LogCollection, is_success=False, message=msg, log_event=False)
263+
245264
except Exception as e:
246265
logger.error("Log collection completed unsuccessfully. Error: {0}".format(ustr(e)))
247266
logger.info("Detailed log output can be found at {0}".format(OUTPUT_RESULTS_FILE_PATH))
@@ -328,7 +347,7 @@ def parse_args(sys_args):
328347
if arg == "":
329348
# Don't parse an empty parameter
330349
continue
331-
m = re.match("^(?:[-/]*)configuration-path:([\w/\.\-_]+)", arg) # pylint: disable=W1401
350+
m = re.match(r"^(?:[-/]*)configuration-path:([\w/\.\-_]+)", arg)
332351
if not m is None:
333352
conf_file_path = m.group(1)
334353
if not os.path.exists(conf_file_path):

azurelinuxagent/common/agent_supported_feature.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,15 @@ def __init__(self):
7777
class _GAVersioningGovernanceFeature(AgentSupportedFeature):
7878
"""
7979
CRP would drive the RSM update if agent reports that it does support RSM upgrades with this flag otherwise CRP fallback to largest version.
80-
Agent doesn't report supported feature flag if auto update is disabled or old version of agent running that doesn't understand GA versioning.
80+
Agent doesn't report supported feature flag if auto update is disabled or old version of agent running that doesn't understand GA versioning
81+
or if explicitly support for versioning is disabled in agent
8182
8283
Note: Especially Windows need this flag to report to CRP that GA doesn't support the updates. So linux adopted same flag to have a common solution.
8384
"""
8485

8586
__NAME = SupportedFeatureNames.GAVersioningGovernance
8687
__VERSION = "1.0"
87-
__SUPPORTED = conf.get_auto_update_to_latest_version()
88+
__SUPPORTED = conf.get_auto_update_to_latest_version() and conf.get_enable_ga_versioning()
8889

8990
def __init__(self):
9091
super(_GAVersioningGovernanceFeature, self).__init__(name=self.__NAME,

azurelinuxagent/common/conf.py

+23-4
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ class ConfigurationProvider(object):
3535
"""
3636

3737
def __init__(self):
38-
self.values = dict()
38+
self.values = {}
3939

4040
def load(self, content):
4141
if not content:
@@ -146,7 +146,8 @@ def load_conf_from_file(conf_file_path, conf=__conf__):
146146
"Debug.CgroupDisableOnQuotaCheckFailure": True,
147147
"Debug.EnableAgentMemoryUsageCheck": False,
148148
"Debug.EnableFastTrack": True,
149-
"Debug.EnableGAVersioning": True
149+
"Debug.EnableGAVersioning": True,
150+
"Debug.EnableCgroupV2ResourceLimiting": False
150151
}
151152

152153

@@ -200,7 +201,8 @@ def load_conf_from_file(conf_file_path, conf=__conf__):
200201
"Debug.EtpCollectionPeriod": 300,
201202
"Debug.AutoUpdateHotfixFrequency": 14400,
202203
"Debug.AutoUpdateNormalFrequency": 86400,
203-
"Debug.FirewallRulesLogPeriod": 86400
204+
"Debug.FirewallRulesLogPeriod": 86400,
205+
"Debug.LogCollectorInitialDelay": 5 * 60
204206
}
205207

206208

@@ -670,7 +672,7 @@ def get_enable_ga_versioning(conf=__conf__):
670672
If True, the agent looks for rsm updates(checking requested version in GS) otherwise it will fall back to self-update and finds the highest version from PIR.
671673
NOTE: This option is experimental and may be removed in later versions of the Agent.
672674
"""
673-
return conf.get_switch("Debug.EnableGAVersioning", False)
675+
return conf.get_switch("Debug.EnableGAVersioning", True)
674676

675677

676678
def get_firewall_rules_log_period(conf=__conf__):
@@ -680,3 +682,20 @@ def get_firewall_rules_log_period(conf=__conf__):
680682
NOTE: This option is experimental and may be removed in later versions of the Agent.
681683
"""
682684
return conf.get_int("Debug.FirewallRulesLogPeriod", 86400)
685+
686+
687+
def get_enable_cgroup_v2_resource_limiting(conf=__conf__):
688+
"""
689+
If True, the agent will enable resource monitoring and enforcement for the log collector on machines using cgroup v2.
690+
NOTE: This option is experimental and may be removed in later versions of the Agent.
691+
"""
692+
return conf.get_switch("Debug.EnableCgroupV2ResourceLimiting", False)
693+
694+
695+
def get_log_collector_initial_delay(conf=__conf__):
696+
"""
697+
Determine the initial delay at service start before the first periodic log collection.
698+
699+
NOTE: This option is experimental and may be removed in later versions of the Agent.
700+
"""
701+
return conf.get_int("Debug.LogCollectorInitialDelay", 5 * 60)

0 commit comments

Comments
 (0)