Skip to content

Commit

Permalink
[analyzer] Refactor CallDescription match mode (NFC) (#83432)
Browse files Browse the repository at this point in the history
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
llvm/llvm-project#81597

To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.

Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
 - `CDF_None` was the default "match anything" mode,
 - `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).

Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
  • Loading branch information
NagyDonat authored Mar 4, 2024
1 parent bbeb946 commit 52a460f
Show file tree
Hide file tree
Showing 6 changed files with 138 additions and 101 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,48 @@ class IdentifierInfo;

namespace clang {
namespace ento {

enum CallDescriptionFlags : unsigned {
CDF_None = 0,

/// Describes a C standard function that is sometimes implemented as a macro
/// that expands to a compiler builtin with some __builtin prefix.
/// The builtin may as well have a few extra arguments on top of the requested
/// number of arguments.
CDF_MaybeBuiltin = 1 << 0,
};

/// This class represents a description of a function call using the number of
/// arguments and the name of the function.
/// A `CallDescription` is a pattern that can be used to _match_ calls
/// based on the qualified name and the argument/parameter counts.
class CallDescription {
public:
enum class Mode {
/// Match calls to functions from the C standard library. On some platforms
/// some functions may be implemented as macros that expand to calls to
/// built-in variants of the given functions, so in this mode we use some
/// heuristics to recognize these implementation-defined variants:
/// - We also accept calls where the name is derived from the specified
/// name by adding "__builtin" or similar prefixes/suffixes.
/// - We also accept calls where the number of arguments or parameters is
/// greater than the specified value.
/// For the exact heuristics, see CheckerContext::isCLibraryFunction().
/// Note that functions whose declaration context is not a TU (e.g.
/// methods, functions in namespaces) are not accepted as C library
/// functions.
/// FIXME: If I understand it correctly, this discards calls where C++ code
/// refers a C library function through the namespace `std::` via headers
/// like <cstdlib>.
CLibrary,

/// Matches "simple" functions that are not methods. (Static methods are
/// methods.)
SimpleFunc,

/// Matches a C++ method (may be static, may be virtual, may be an
/// overloaded operator, a constructor or a destructor).
CXXMethod,

/// Match any CallEvent that is not an ObjCMethodCall.
/// FIXME: Previously this was the default behavior of CallDescription, but
/// its use should be replaced by a more specific mode almost everywhere.
Unspecified,

/// FIXME: Add support for ObjCMethodCall events (I'm not adding it because
/// I'm not familiar with Objective-C). Note that currently an early return
/// in `bool matches(const CallEvent &Call) const;` discards all
/// Objective-C method calls.
};

private:
friend class CallEvent;
using MaybeCount = std::optional<unsigned>;

Expand All @@ -50,20 +78,26 @@ class CallDescription {
std::vector<std::string> QualifiedName;
MaybeCount RequiredArgs;
MaybeCount RequiredParams;
int Flags;
Mode MatchAs;

public:
/// Constructs a CallDescription object.
///
/// @param MatchAs Specifies the kind of the call that should be matched.
///
/// @param QualifiedName The list of the name qualifiers of the function that
/// will be matched. The user is allowed to skip any of the qualifiers.
/// For example, {"std", "basic_string", "c_str"} would match both
/// std::basic_string<...>::c_str() and std::__1::basic_string<...>::c_str().
///
/// @param RequiredArgs The number of arguments that is expected to match a
/// call. Omit this parameter to match every occurrence of call with a given
/// name regardless the number of arguments.
CallDescription(CallDescriptionFlags Flags, ArrayRef<StringRef> QualifiedName,
/// @param RequiredArgs The expected number of arguments that are passed to
/// the function. Omit this parameter (or pass std::nullopt) to match every
/// occurrence without checking the argument count in the call.
///
/// @param RequiredParams The expected number of parameters in the function
/// definition that is called. Omit this parameter to match every occurrence
/// without checking the parameter count in the definition.
CallDescription(Mode MatchAs, ArrayRef<StringRef> QualifiedName,
MaybeCount RequiredArgs = std::nullopt,
MaybeCount RequiredParams = std::nullopt);

Expand Down Expand Up @@ -222,6 +256,10 @@ template <typename T> class CallDescriptionMap {
}
};

/// Enumerators of this enum class are used to construct CallDescription
/// objects; in that context the fully qualified name is needlessly verbose.
using CDM = CallDescription::Mode;

/// An immutable set of CallDescriptions.
/// Checkers can efficiently decide if a given CallEvent matches any
/// CallDescription in the set.
Expand Down
65 changes: 32 additions & 33 deletions clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,48 +124,47 @@ class CStringChecker : public Checker< eval::Call,
const CallEvent &)>;

CallDescriptionMap<FnCheck> Callbacks = {
{{CDF_MaybeBuiltin, {"memcpy"}, 3},
{{CDM::CLibrary, {"memcpy"}, 3},
std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Regular)},
{{CDF_MaybeBuiltin, {"wmemcpy"}, 3},
{{CDM::CLibrary, {"wmemcpy"}, 3},
std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Wide)},
{{CDF_MaybeBuiltin, {"mempcpy"}, 3},
{{CDM::CLibrary, {"mempcpy"}, 3},
std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Regular)},
{{CDF_None, {"wmempcpy"}, 3},
{{CDM::Unspecified, {"wmempcpy"}, 3},
std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Wide)},
{{CDF_MaybeBuiltin, {"memcmp"}, 3},
{{CDM::CLibrary, {"memcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
{{CDF_MaybeBuiltin, {"wmemcmp"}, 3},
{{CDM::CLibrary, {"wmemcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Wide)},
{{CDF_MaybeBuiltin, {"memmove"}, 3},
{{CDM::CLibrary, {"memmove"}, 3},
std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Regular)},
{{CDF_MaybeBuiltin, {"wmemmove"}, 3},
{{CDM::CLibrary, {"wmemmove"}, 3},
std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Wide)},
{{CDF_MaybeBuiltin, {"memset"}, 3}, &CStringChecker::evalMemset},
{{CDF_MaybeBuiltin, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
{{CDF_MaybeBuiltin, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
{{CDF_MaybeBuiltin, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
{{CDF_MaybeBuiltin, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
{{CDF_MaybeBuiltin, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
{{CDF_MaybeBuiltin, {"strcat"}, 2}, &CStringChecker::evalStrcat},
{{CDF_MaybeBuiltin, {"strncat"}, 3}, &CStringChecker::evalStrncat},
{{CDF_MaybeBuiltin, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
{{CDF_MaybeBuiltin, {"strlen"}, 1}, &CStringChecker::evalstrLength},
{{CDF_MaybeBuiltin, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
{{CDF_MaybeBuiltin, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDF_MaybeBuiltin, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDF_MaybeBuiltin, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
{{CDF_MaybeBuiltin, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
{{CDF_MaybeBuiltin, {"strcasecmp"}, 2}, &CStringChecker::evalStrcasecmp},
{{CDF_MaybeBuiltin, {"strncasecmp"}, 3},
&CStringChecker::evalStrncasecmp},
{{CDF_MaybeBuiltin, {"strsep"}, 2}, &CStringChecker::evalStrsep},
{{CDF_MaybeBuiltin, {"bcopy"}, 3}, &CStringChecker::evalBcopy},
{{CDF_MaybeBuiltin, {"bcmp"}, 3},
{{CDM::CLibrary, {"memset"}, 3}, &CStringChecker::evalMemset},
{{CDM::CLibrary, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
{{CDM::CLibrary, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
{{CDM::CLibrary, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
{{CDM::CLibrary, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
{{CDM::CLibrary, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
{{CDM::CLibrary, {"strcat"}, 2}, &CStringChecker::evalStrcat},
{{CDM::CLibrary, {"strncat"}, 3}, &CStringChecker::evalStrncat},
{{CDM::CLibrary, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
{{CDM::CLibrary, {"strlen"}, 1}, &CStringChecker::evalstrLength},
{{CDM::CLibrary, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
{{CDM::CLibrary, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDM::CLibrary, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDM::CLibrary, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
{{CDM::CLibrary, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
{{CDM::CLibrary, {"strcasecmp"}, 2}, &CStringChecker::evalStrcasecmp},
{{CDM::CLibrary, {"strncasecmp"}, 3}, &CStringChecker::evalStrncasecmp},
{{CDM::CLibrary, {"strsep"}, 2}, &CStringChecker::evalStrsep},
{{CDM::CLibrary, {"bcopy"}, 3}, &CStringChecker::evalBcopy},
{{CDM::CLibrary, {"bcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
{{CDF_MaybeBuiltin, {"bzero"}, 2}, &CStringChecker::evalBzero},
{{CDF_MaybeBuiltin, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
{{CDF_MaybeBuiltin, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
{{CDF_MaybeBuiltin, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
{{CDM::CLibrary, {"bzero"}, 2}, &CStringChecker::evalBzero},
{{CDM::CLibrary, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
{{CDM::CLibrary, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
{{CDM::CLibrary, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
};

// These require a bit of special handling.
Expand Down
50 changes: 22 additions & 28 deletions clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -718,28 +718,24 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
{{{"isupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{{"isxdigit"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},

{{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncat)}},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrncat)}},
TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrlcpy)}},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrlcpy)}},
TR::Prop({{1, 2}}, {{0}})},
{{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrlcat)}},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrlcat)}},
TR::Prop({{1, 2}}, {{0}})},
{{CDF_MaybeBuiltin, {{"snprintf"}}},
{{CDM::CLibrary, {{"snprintf"}}},
TR::Prop({{1}, 3}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"sprintf"}}},
{{CDM::CLibrary, {{"sprintf"}}},
TR::Prop({{1}, 2}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"strcpy"}}},
{{CDM::CLibrary, {{"strcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"stpcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"strcat"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"wcsncat"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"stpcpy"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"strcat"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"wcsncat"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"strdupa"}}},
TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDF_MaybeBuiltin, {{"wcsdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDM::CLibrary, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDM::CLibrary, {{"strdupa"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDM::CLibrary, {{"wcsdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},

// Sinks
{{{"system"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
Expand All @@ -753,31 +749,29 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
{{{"execvp"}}, TR::Sink({{0, 1}}, MsgSanitizeSystemArgs)},
{{{"execvpe"}}, TR::Sink({{0, 1, 2}}, MsgSanitizeSystemArgs)},
{{{"dlopen"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
{{CDF_MaybeBuiltin, {{"malloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {{"calloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {{"alloca"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {{"memccpy"}}},
TR::Sink({{3}}, MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {{"realloc"}}},
TR::Sink({{1}}, MsgTaintedBufferSize)},
{{CDM::CLibrary, {{"malloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDM::CLibrary, {{"calloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDM::CLibrary, {{"alloca"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
{{CDM::CLibrary, {{"memccpy"}}}, TR::Sink({{3}}, MsgTaintedBufferSize)},
{{CDM::CLibrary, {{"realloc"}}}, TR::Sink({{1}}, MsgTaintedBufferSize)},
{{{{"setproctitle"}}}, TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
{{{{"setproctitle_fast"}}},
TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},

// SinkProps
{{CDF_MaybeBuiltin, BI.getName(Builtin::BImemcpy)},
{{CDM::CLibrary, BI.getName(Builtin::BImemcpy)},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {BI.getName(Builtin::BImemmove)}},
{{CDM::CLibrary, {BI.getName(Builtin::BImemmove)}},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncpy)}},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrncpy)}},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrndup)}},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrndup)}},
TR::SinkProp({{1}}, {{0, 1}}, {{ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDF_MaybeBuiltin, {{"bcopy"}}},
{{CDM::CLibrary, {{"bcopy"}}},
TR::SinkProp({{2}}, {{0, 2}}, {{1}}, MsgTaintedBufferSize)}};

// `getenv` returns taint only in untrusted environments.
Expand Down
8 changes: 4 additions & 4 deletions clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -410,13 +410,13 @@ class MallocChecker
{{{"malloc"}, 3}, &MallocChecker::checkKernelMalloc},
{{{"calloc"}, 2}, &MallocChecker::checkCalloc},
{{{"valloc"}, 1}, &MallocChecker::checkBasicAlloc},
{{CDF_MaybeBuiltin, {"strndup"}, 2}, &MallocChecker::checkStrdup},
{{CDF_MaybeBuiltin, {"strdup"}, 1}, &MallocChecker::checkStrdup},
{{CDM::CLibrary, {"strndup"}, 2}, &MallocChecker::checkStrdup},
{{CDM::CLibrary, {"strdup"}, 1}, &MallocChecker::checkStrdup},
{{{"_strdup"}, 1}, &MallocChecker::checkStrdup},
{{{"kmalloc"}, 2}, &MallocChecker::checkKernelMalloc},
{{{"if_nameindex"}, 1}, &MallocChecker::checkIfNameIndex},
{{CDF_MaybeBuiltin, {"wcsdup"}, 1}, &MallocChecker::checkStrdup},
{{CDF_MaybeBuiltin, {"_wcsdup"}, 1}, &MallocChecker::checkStrdup},
{{CDM::CLibrary, {"wcsdup"}, 1}, &MallocChecker::checkStrdup},
{{CDM::CLibrary, {"_wcsdup"}, 1}, &MallocChecker::checkStrdup},
{{{"g_malloc"}, 1}, &MallocChecker::checkBasicAlloc},
{{{"g_malloc0"}, 1}, &MallocChecker::checkGMalloc0},
{{{"g_try_malloc"}, 1}, &MallocChecker::checkBasicAlloc},
Expand Down
21 changes: 14 additions & 7 deletions clang/lib/StaticAnalyzer/Core/CallDescription.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ static MaybeCount readRequiredParams(MaybeCount RequiredArgs,
return std::nullopt;
}

ento::CallDescription::CallDescription(CallDescriptionFlags Flags,
ento::CallDescription::CallDescription(Mode MatchAs,
ArrayRef<StringRef> QualifiedName,
MaybeCount RequiredArgs /*= None*/,
MaybeCount RequiredParams /*= None*/)
: RequiredArgs(RequiredArgs),
RequiredParams(readRequiredParams(RequiredArgs, RequiredParams)),
Flags(Flags) {
MatchAs(MatchAs) {
assert(!QualifiedName.empty());
this->QualifiedName.reserve(QualifiedName.size());
llvm::transform(QualifiedName, std::back_inserter(this->QualifiedName),
Expand All @@ -52,7 +52,8 @@ ento::CallDescription::CallDescription(CallDescriptionFlags Flags,
ento::CallDescription::CallDescription(ArrayRef<StringRef> QualifiedName,
MaybeCount RequiredArgs /*= None*/,
MaybeCount RequiredParams /*= None*/)
: CallDescription(CDF_None, QualifiedName, RequiredArgs, RequiredParams) {}
: CallDescription(Mode::Unspecified, QualifiedName, RequiredArgs,
RequiredParams) {}

bool ento::CallDescription::matches(const CallEvent &Call) const {
// FIXME: Add ObjC Message support.
Expand All @@ -74,14 +75,20 @@ bool ento::CallDescription::matchesAsWritten(const CallExpr &CE) const {
return matchesImpl(FD, CE.getNumArgs(), FD->param_size());
}

bool ento::CallDescription::matchesImpl(const FunctionDecl *Callee,
size_t ArgCount,
bool ento::CallDescription::matchesImpl(const FunctionDecl *FD, size_t ArgCount,
size_t ParamCount) const {
const auto *FD = Callee;
if (!FD)
return false;

if (Flags & CDF_MaybeBuiltin) {
const bool isMethod = isa<CXXMethodDecl>(FD);

if (MatchAs == Mode::SimpleFunc && isMethod)
return false;

if (MatchAs == Mode::CXXMethod && !isMethod)
return false;

if (MatchAs == Mode::CLibrary) {
return CheckerContext::isCLibraryFunction(FD, getFunctionName()) &&
(!RequiredArgs || *RequiredArgs <= ArgCount) &&
(!RequiredParams || *RequiredParams <= ParamCount);
Expand Down
Loading

0 comments on commit 52a460f

Please sign in to comment.