Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of platform-independent CDS extractor #169

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
31927be
WIP make the CDS extractor platform independent
data-douser Jan 23, 2025
c50380c
Merge branch 'advanced-security:main' into data-douser/extractor-dev-2
data-douser Jan 23, 2025
db64e01
Merge branch 'advanced-security:main' into data-douser/extractor-dev-2
data-douser Jan 23, 2025
54938b6
Improve extractor script comments and file paths
data-douser Jan 28, 2025
ea229d1
Add requested comment to extractor.js responsFiles
data-douser Jan 28, 2025
aed1fc6
WIP fixes for CDS extractor rewrite
data-douser Jan 29, 2025
d02234e
Fix CDS extractor JS undefined var
data-douser Jan 30, 2025
a17b8c2
Set cwd for CDS JS autobuild process
data-douser Jan 30, 2025
0b6994c
Another attempted fix for index-files.js cwd
data-douser Feb 4, 2025
2673f53
Document index-files change for grandfathered package.json
data-douser Feb 5, 2025
212820a
Remove shell quote from index-files logging
data-douser Feb 5, 2025
cc99914
index-files.js must compiles cds to file (not dir)
data-douser Feb 5, 2025
5d16df0
Attempted fix for missing CDS SARIF results
data-douser Feb 5, 2025
c9e02ed
WIP make the CDS extractor platform independent
data-douser Jan 23, 2025
9332311
Improve extractor script comments and file paths
data-douser Jan 28, 2025
a9b987a
Add requested comment to extractor.js responsFiles
data-douser Jan 28, 2025
5a49a25
WIP fixes for CDS extractor rewrite
data-douser Jan 29, 2025
caaadbb
Fix CDS extractor JS undefined var
data-douser Jan 30, 2025
c9c7cde
Set cwd for CDS JS autobuild process
data-douser Jan 30, 2025
c213d6b
Another attempted fix for index-files.js cwd
data-douser Feb 4, 2025
4f234be
Document index-files change for grandfathered package.json
data-douser Feb 5, 2025
b614751
Remove shell quote from index-files logging
data-douser Feb 5, 2025
08f8624
index-files.js must compiles cds to file (not dir)
data-douser Feb 5, 2025
dbc3ba7
Attempted fix for missing CDS SARIF results
data-douser Feb 5, 2025
c8b643c
Merge branch 'data-douser/extractor-dev-2' of github.com:data-douser/…
data-douser Feb 24, 2025
c410382
Merge branch 'main' into data-douser/extractor-dev-2
data-douser Mar 4, 2025
33d51d0
Improve handling of cds compile output files
data-douser Mar 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions extractors/cds/tools/autobuild.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@echo off

type NUL && "%CODEQL_DIST%\codeql" database index-files ^
--include-extension=.cds ^
--language cds ^
--prune **\node_modules\**\* ^
--prune **\.eslint\**\* ^
--total-size-limit=10m ^
-- ^
"%CODEQL_EXTRACTOR_CDS_WIP_DATABASE%"

exit /b %ERRORLEVEL%
7 changes: 4 additions & 3 deletions extractors/cds/tools/autobuild.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/sh
#!/usr/bin/env bash

set -eu

Expand All @@ -9,9 +9,10 @@ set -eu
# Any changes should be synchronized between these three places.

exec "${CODEQL_DIST}/codeql" database index-files \
--language cds \
--total-size-limit 10m \
--include-extension=.cds \
--language cds \
--prune **/node_modules/**/* \
--prune **/.eslint/**/* \
--total-size-limit=10m \
-- \
"$CODEQL_EXTRACTOR_CDS_WIP_DATABASE"
38 changes: 38 additions & 0 deletions extractors/cds/tools/index-files.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
@echo off

if "%~1"=="" (
echo Usage: %0 ^<response_file_path^>
exit /b 1
)

where node >nul 2>nul
if %ERRORLEVEL% neq 0 (
echo node executable is required (in PATH) to run the 'index-files.js' script. Please install Node.js and try again.
exit /b 2
)

where npm >nul 2>nul
if %ERRORLEVEL% neq 0 (
echo npm executable is required (in PATH) to install the dependencies for the 'index-files.js' script.
exit /b 3
)

set "_response_file_path=%~1"
set "_script_dir=%~dp0"

echo Checking response file for CDS files to index

if not exist "%_response_file_path%" (
echo 'codeql database index-files --language cds' command terminated early as response file '%_response_file_path%' does not exist or is empty. This is because no CDS files were selected or found.
exit /b 0
)

REM Change to the directory of this script to ensure that npm looks up
REM the package.json file in the correct directory and installs the
REM dependencies (i.e. node_modules) relative to this directory.
cd /d "%_script_dir%" && ^
echo Installing node package dependencies and running the 'index-files.js' script && ^
npm install --quiet --no-audit --no-fund --no-package-json && ^
node "%_script_dir%index-files.js" "%_response_file_path%"

exit /b %ERRORLEVEL%
158 changes: 158 additions & 0 deletions extractors/cds/tools/index-files.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
const { execSync, spawnSync } = require('child_process');
const { existsSync, readFileSync, statSync } = require('fs');
const { arch, platform } = require('os');
const { dirname, join, resolve } = require('path');

console.log('Indexing CDS files');

const responseFile = process.argv[2];

const osPlatform = platform();
const osPlatformArch = arch();
console.log(`Detected OS platform=${osPlatform} : arch=${osPlatformArch}`);
const autobuildScriptName = osPlatform === 'win32' ? 'autobuild.cmd' : 'autobuild.sh';
const codeqlExe = osPlatform === 'win32' ? 'codeql.exe' : 'codeql';
const codeqlExePath = join(process.env.CODEQL_DIST, codeqlExe);
const npmInstallCmdWithArgs = 'npm install --quiet --no-audit --no-fund --no-package-lock';

// If the response file does not exist, terminate.
if (!existsSync(responseFile)) {
console.log(`'codeql database index-files --language cds' terminated early as response file '${responseFile}' does not exist. This is because no CDS files were selected or found.`);
process.exit(0);
}

// Read the response file and split it into lines, removing (filter(Boolean)) empty lines.
const responseFiles = readFileSync(responseFile, 'utf-8').split('\n').filter(Boolean);
// If the response file is empty, terminate.
if (statSync(responseFile).size === 0 || !responseFiles) {
console.log(`'codeql database index-files --language cds' terminated early as response file '${responseFile}' is empty. This is because no CDS files were selected or found.`);
process.exit(0);
}

// Determine if we have the cds commands available. If not, install the cds develpment kit
// (cds-dk) in the appropriate directories and use npx to run the cds command from there.
let cdsCommand = 'cds';
try {
execSync('cds --version', { stdio: 'ignore' });
} catch {
console.log('Pre-installing cds compiler');

/**
* Find all the directories containing a package.json with a dependency on `@sap/cds`,
* where the directory contains at least one of the files listed in the response file
* (e.g. the cds files we want to extract).
*
* We then install the CDS development kit (`@sap/cds-dk`) in each directory, which
* makes the `cds` command usable from the npx command within that directory.
*
* Nested package.json files simply cause the package to be installed in the parent
* node_modules directory.
*
* TODO : fix implementation or change ^comment^ to reflect the actual implementation.
*
* We also ensure we skip node_modules, as we can end up in a recursive loop.
*/
const packageJsonDirs = new Set();
responseFiles.forEach(file => {
let dir = dirname(file);
while (dir !== resolve(dir, '..')) {
if (existsSync(join(dir, 'package.json')) && readFileSync(join(dir, 'package.json'), 'utf-8').includes('@sap/cds')) {
packageJsonDirs.add(dir);
break;
}
dir = resolve(dir, '..');
}
});

packageJsonDirs.forEach(dir => {
console.log(`Installing @sap/cds-dk into ${dir} to enable CDS compilation.`);
execSync(`${npmInstallCmdWithArgs} @sap/cds-dk`, { cwd: dir });
execSync(npmInstallCmdWithArgs, { cwd: dir });
});

/**
* Use the `npx` command to dynamically install the CDS development kit (`@sap/cds-dk`)
* package if necessary, which then provides the `cds` command line tool in directories
* which are not covered by the package.json install command approach above.
*/
cdsCommand = 'npx -y --package @sap/cds-dk cds';
}

console.log('Processing CDS files to JSON');

/**
* Run the cds compile command on each file in the response files list, outputting the
* compiled JSON to a file with the same name but with a .json extension appended.
*/
responseFiles.forEach(cdsFile => {
const cdsJsonFile = `${cdsFile}.json`;
console.log(`Processing CDS file ${cdsFile} to: ${cdsJsonFile}`);
const result = spawnSync(cdsCommand, ['compile', cdsFile, '-2', 'json', '-o', cdsJsonFile, '--locations'], { shell: true });
if (result.error || result.status !== 0) {
const stderrTruncated = result.stderr.toString().split('\n').filter(line => line.startsWith('[ERROR]')).slice(-4).join('\n');
const errorMessage = `Could not compile the file ${cdsFile}.\nReported error(s):\n\`\`\`\n${stderrTruncated}\n\`\`\``;
console.log(errorMessage);
execSync(`${codeqlExePath} database add-diagnostic --extractor-name cds --ready-for-status-page --source-id=cds/compilation-failure --source-name="Failure to compile one or more SAP CAP CDS files" --severity=error --markdown-message="${errorMessage}" --file-path="${cdsFile}" -- "${process.env.CODEQL_EXTRACTOR_CDS_WIP_DATABASE}"`);
}
});

// Check if the (JavaScript) JS extractor variables are set, and set them if not.
if (!process.env.CODEQL_EXTRACTOR_JAVASCRIPT_ROOT) {
// Find the JS extractor location.
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_ROOT = execSync(`${codeqlExePath} resolve extractor --language=javascript`).toString().trim();
// Set the JAVASCRIPT extractor environment variables to the same as the CDS
// extractor environment variables so that the JS extractor will write to the
// CDS database.
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_WIP_DATABASE = process.env.CODEQL_EXTRACTOR_CDS_WIP_DATABASE;
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_DIAGNOSTIC_DIR = process.env.CODEQL_EXTRACTOR_CDS_DIAGNOSTIC_DIR;
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_LOG_DIR = process.env.CODEQL_EXTRACTOR_CDS_LOG_DIR;
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR = process.env.CODEQL_EXTRACTOR_CDS_SCRATCH_DIR;
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_TRAP_DIR = process.env.CODEQL_EXTRACTOR_CDS_TRAP_DIR;
process.env.CODEQL_EXTRACTOR_JAVASCRIPT_SOURCE_ARCHIVE_DIR = process.env.CODEQL_EXTRACTOR_CDS_SOURCE_ARCHIVE_DIR;
}

let excludeFilters = '';
/**
* Check if LGTM_INDEX_FILTERS is already set. This tyically happens if either
* "paths" and/or "paths-ignore" is set in the lgtm.yml file.
*/
if (process.env.LGTM_INDEX_FILTERS) {
console.log(`Found $LGTM_INDEX_FILTERS already set to:\n${process.env.LGTM_INDEX_FILTERS}`);
const allowedExcludePatterns = [
join('exclude:**', '*'),
join('exclude:**', '*.*'),
];
/**
* If it is set, we will try to honor the paths-ignore filter.
*
* Split by `\n` and find all the entries that start with exclude, with some
* exclusions allowed for supported glob patterns, and then join them back
* together with `\n`.
*/
excludeFilters = '\n' + process.env.LGTM_INDEX_FILTERS
.split('\n')
.filter(line =>
line.startsWith('exclude')
&&
!allowedExcludePatterns.includes(line)
).join('\n');
}

// Enable extraction of the .cds.json files only.
const lgtmIndexFiltersPatterns = join(
'exclude:**', '*.*\ninclude:**', '*.cds.json\ninclude:**', '*.cds\nexclude:**', 'node_modules', '**', '*.*'
);
process.env.LGTM_INDEX_FILTERS = `${lgtmIndexFiltersPatterns}${excludeFilters}`;
console.log(`Setting $LGTM_INDEX_FILTERS to:\n${process.env.LGTM_INDEX_FILTERS}`);
process.env.LGTM_INDEX_TYPESCRIPT = 'NONE';
// Configure to copy over the .cds files as well, by pretending they are JSON.
process.env.LGTM_INDEX_FILETYPES = '.cds:JSON';
// Ignore the LGTM_INDEX_INCLUDE variable for this purpose as it may explicitly
// refer to .js or .ts files.
delete process.env.LGTM_INDEX_INCLUDE;

console.log('Extracting the cds.json files');

// Invoke the JS autobuilder to index the .cds.json files only.
const autobuildScriptPath = join(process.env.CODEQL_EXTRACTOR_JAVASCRIPT_ROOT, 'tools', autobuildScriptName);
execSync(autobuildScriptPath, { stdio: 'inherit' });
122 changes: 32 additions & 90 deletions extractors/cds/tools/index-files.sh
Original file line number Diff line number Diff line change
@@ -1,104 +1,46 @@
#!/bin/bash
#!/usr/bin/env bash

set -eu

echo "Indexing CDS files"

# Check if the list of files is empty
response_file="$1"

# If the response_file doesn't exist, terminate:
if [ ! -f "$response_file" ]; then
echo "codeql database index-files --language cds terminated early as response file '$response_file' does not exist. This is because no CDS files were selected or found."
exit 0
if [ $# -ne 1 ]
then
echo "Usage: $0 <response_file_path>"
exit 1
fi

# If the response_file is empty, terminate
if [ ! -s "$response_file" ]; then
echo "codeql database index-files --language cds terminated early as response file '$response_file' is empty. This is because no CDS files were selected or found."
exit 0
if ! command -v node > /dev/null
then
echo "node executable is required (in PATH) to run the 'index-files.js' script. Please install Node.js and try again."
exit 2
fi

# Determine if we have the cds command available, and if not, install the cds development kit
# in the appropriate directories
if ! command -v cds &> /dev/null
if ! command -v npm > /dev/null
then
echo "Pre-installing cds compiler"

# Find all the directories containing a package.json with a dependency on @sap/cds, where
# the directory contains at least one of the files listed in the response file (e.g. the
# cds files we want to extract).
#
# We then install the cds development kit (@sap/cds-dk) in each directory, which makes the
# `cds` command usable from the npx command within that directory.
#
# Nested package.json files simply cause the package to be installed in the parent node_modules
# directory.
#
# We also ensure we skip node_modules, as we can end up in a recursive loop
find . -type d -name node_modules -prune -false -o -type f \( -iname 'package.json' \) -exec grep -ql '@sap/cds' {} \; -execdir bash -c "grep -q \"^\$(pwd)\(/\|$\)\" \"$response_file\"" \; -execdir bash -c "echo \"Installing @sap/cds-dk into \$(pwd) to enable CDS compilation.\"" \; -execdir npm install --silent @sap/cds-dk \; -execdir npm install --silent \;

# Use the npx command to dynamically install the cds development kit (@sap/cds-dk) package if necessary,
# which then provides the cds command line tool in directories which are not covered by the package.json
# install command approach above
cds_command="npx -y --package @sap/cds-dk cds"
else
cds_command="cds"
echo "npm executable is required (in PATH) to install the dependencies for the 'index-files.js' script."
exit 3
fi

echo "Processing CDS files to JSON"
_response_file_path="$1"
_script_dir=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Run the cds compile command on each file in the response file, outputting the compiled JSON to a file with
# the same name
while IFS= read -r cds_file; do
echo "Processing CDS file $cds_file to:"
if ! $cds_command compile "$cds_file" -2 json -o "$cds_file.json" --locations 2> "$cds_file.err"; then
stderr_truncated=`grep "^\[ERROR\]" "$cds_file.err" | tail -n 4`
error_message=$'Could not compile the file '"$cds_file"$'.\nReported error(s):\n```\n'"$stderr_truncated"$'\n```'
echo "$error_message"
# Log an error diagnostic which appears on the status page
"$CODEQL_DIST/codeql" database add-diagnostic --extractor-name cds --ready-for-status-page --source-id cds/compilation-failure --source-name "Failure to compile one or more SAP CAP CDS files" --severity error --markdown-message "$error_message" --file-path "$cds_file" "$CODEQL_EXTRACTOR_CDS_WIP_DATABASE"
fi
done < "$response_file"
echo "Checking response file for CDS files to index"

# Check if the JS extractor variables are set, and set them if not
if [ -z "${CODEQL_EXTRACTOR_JAVASCRIPT_ROOT:-}" ]; then
# Find the JavaScript extractor location
export CODEQL_EXTRACTOR_JAVASCRIPT_ROOT="$("$CODEQL_DIST/codeql" resolve extractor --language=javascript)"

# Set the JAVASCRIPT extractor environment variables to the same as the CDS extractor environment variables
# so that the JS extractor will write to the CDS database
export CODEQL_EXTRACTOR_JAVASCRIPT_WIP_DATABASE="$CODEQL_EXTRACTOR_CDS_WIP_DATABASE"
export CODEQL_EXTRACTOR_JAVASCRIPT_DIAGNOSTIC_DIR="$CODEQL_EXTRACTOR_CDS_DIAGNOSTIC_DIR"
export CODEQL_EXTRACTOR_JAVASCRIPT_LOG_DIR="$CODEQL_EXTRACTOR_CDS_LOG_DIR"
export CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR="$CODEQL_EXTRACTOR_CDS_SCRATCH_DIR"
export CODEQL_EXTRACTOR_JAVASCRIPT_TRAP_DIR="$CODEQL_EXTRACTOR_CDS_TRAP_DIR"
export CODEQL_EXTRACTOR_JAVASCRIPT_SOURCE_ARCHIVE_DIR="$CODEQL_EXTRACTOR_CDS_SOURCE_ARCHIVE_DIR"
fi

# Check if LGTM_INDEX_FILTERS is already set
# This typically happens if "paths" or "paths-ignore" are set in the LGTM.yml file
if [ -z "${LGTM_INDEX_FILTERS:-}" ]; then
exclude_filters=""
else
echo $'Found \$LGTM_INDEX_FILTERS already set to:\n'"$LGTM_INDEX_FILTERS"
# If it is set, we will try to honour the paths-ignore filter
# Split by \n and find all the entries that start with exclude, excluding "exclude:**/*" and "exclude:**/*.*"
# and then join them back together with \n
exclude_filters=$'\n'"$(echo "$LGTM_INDEX_FILTERS" | grep '^exclude' | grep -v 'exclude:\*\*/\*\|exclude:\*\*/\*\.\*')"
# Terminate early if the _response_file_path doesn't exist or is empty,
# which indicates that no CDS files were selected or found.
if [ ! -f "$_response_file_path" ] || [ ! -s "$_response_file_path" ]
then
echo "'codeql database index-files --language cds' command terminated early as response file '$_response_file_path' does not exist or is empty. This is because no CDS files were selected or found."
# Exit without error to avoid failing any calling (javascript)
# extractor, and llow the tool the report the lack of coverage
# for CDS files.
exit 0
fi

# Enable extraction of the cds.json files only
export LGTM_INDEX_FILTERS=$'exclude:**/*.*\ninclude:**/*.cds.json\ninclude:**/*.cds\nexclude:**/node_modules/**/*.*'"$exclude_filters"
echo "Setting \$LGTM_INDEX_FILTERS to:\n$LGTM_INDEX_FILTERS"
export LGTM_INDEX_TYPESCRIPT="NONE"
# Configure to copy over the CDS files as well, by pretending they are JSON
export LGTM_INDEX_FILETYPES=".cds:JSON"
# Ignore the LGTM_INDEX_INCLUDE variable for this purpose, as it may
# refer explicitly to .ts or .js files
unset LGTM_INDEX_INCLUDE

echo "Extracting the cds.json files"

# Invoke the JavaScript autobuilder to index the .cds.json files only
"$CODEQL_EXTRACTOR_JAVASCRIPT_ROOT"/tools/autobuild.sh
# Change to the directory of this script to ensure that npm looks up
# the package.json file in the correct directory and installs the
# dependencies (i.e. node_modules) relative to this directory.
cd "$_script_dir" && \
echo "Installing node package dependencies" && \
npm install --quiet --no-audit --no-fund --no-package-json && \
echo "Running the 'index-files.js' script" && \
node "$(dirname "$0")/index-files.js" "$_response_file_path"
Loading
Loading