contrib/expat: import expat 2.7.1

Changes: https://github.com/libexpat/libexpat/blob/R_2_7_1/expat/Changes
         https://github.com/libexpat/libexpat/blob/R_2_7_0/expat/Changes

Security:	CVE-2024-8176

(cherry picked from commit fe9278888fd4414abe2d922e469cf608005f4c65)
This commit is contained in:
Philip Paeps 2025-04-02 16:56:02 +08:00
parent 54a94356c9
commit 6f7ee9ac03
28 changed files with 1780 additions and 265 deletions

View file

@ -1,5 +1,5 @@
Copyright (c) 1998-2000 Thai Open Source Software Center Ltd and Clark Cooper
Copyright (c) 2001-2022 Expat maintainers
Copyright (c) 2001-2025 Expat maintainers
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the

View file

@ -11,16 +11,23 @@
!! The following topics need *additional skilled C developers* to progress !!
!! in a timely manner or at all (loosely ordered by descending priority): !!
!! !!
!! - <blink>fixing a complex non-public security issue</blink>, !!
!! - teaming up on researching and fixing future security reports and !!
!! ClusterFuzz findings with few-days-max response times in communication !!
!! in order to (1) have a sound fix ready before the end of a 90 days !!
!! grace period and (2) in a sustainable manner, !!
!! - helping CPython Expat bindings with supporting Expat's billion laughs !!
!! attack protection API (https://github.com/python/cpython/issues/90949): !!
!! - XML_SetBillionLaughsAttackProtectionActivationThreshold !!
!! - XML_SetBillionLaughsAttackProtectionMaximumAmplification !!
!! - helping Perl's XML::Parser Expat bindings with supporting Expat's !!
!! security API (https://github.com/cpan-authors/XML-Parser/issues/102): !!
!! - XML_SetBillionLaughsAttackProtectionActivationThreshold !!
!! - XML_SetBillionLaughsAttackProtectionMaximumAmplification !!
!! - XML_SetReparseDeferralEnabled !!
!! - implementing and auto-testing XML 1.0r5 support !!
!! (needs discussion before pull requests), !!
!! - smart ideas on fixing the Autotools CMake files generation issue !!
!! without breaking CI (needs discussion before pull requests), !!
!! - the Windows binaries topic (needs requirements engineering first), !!
!! - pushing migration from `int` to `size_t` further !!
!! including edge-cases test coverage (needs discussion before anything). !!
!! !!
@ -30,6 +37,116 @@
!! THANK YOU! Sebastian Pipping -- Berlin, 2024-03-09 !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Release 2.7.1 Thu March 27 2025
Bug fixes:
#980 #989 Restore event pointer behavior from Expat 2.6.4
(that the fix to CVE-2024-8176 changed in 2.7.0);
affected API functions are:
- XML_GetCurrentByteCount
- XML_GetCurrentByteIndex
- XML_GetCurrentColumnNumber
- XML_GetCurrentLineNumber
- XML_GetInputContext
Other changes:
#976 #977 Autotools: Integrate files "fuzz/xml_lpm_fuzzer.{cpp,proto}"
with Automake that were missing from 2.7.0 release tarballs
#983 #984 Fix printf format specifiers for 32bit Emscripten
#992 docs: Promote OpenSSF Best Practices self-certification
#978 tests/benchmark: Resolve mistaken double close
#986 Address compiler warnings
#990 #993 Version info bumped from 11:1:10 (libexpat*.so.1.10.1)
to 11:2:10 (libexpat*.so.1.10.2); see https://verbump.de/
for what these numbers do
Infrastructure:
#982 CI: Start running Perl XML::Parser integration tests
#987 CI: Enforce Clang Static Analyzer clean code
#991 CI: Re-enable warning clang-analyzer-valist.Uninitialized
for clang-tidy
#981 CI: Cover compilation with musl
#983 #984 CI: Cover compilation with 32bit Emscripten
#976 #977 CI: Protect against fuzzer files missing from future
release archives
Special thanks to:
Berkay Eren Ürün
Matthew Fernandez
and
Perl XML::Parser
Release 2.7.0 Thu March 13 2025
Security fixes:
#893 #973 CVE-2024-8176 -- Fix crash from chaining a large number
of entities caused by stack overflow by resolving use of
recursion, for all three uses of entities:
- general entities in character data ("<e>&g1;</e>")
- general entities in attribute values ("<e k1='&g1;'/>")
- parameter entities ("%p1;")
Known impact is (reliable and easy) denial of service:
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H/E:H/RL:O/RC:C
(Base Score: 7.5, Temporal Score: 7.2)
Please note that a layer of compression around XML can
significantly reduce the minimum attack payload size.
Other changes:
#935 #937 Autotools: Make generated CMake files look for
libexpat.@SO_MAJOR@.dylib on macOS
#925 Autotools: Sync CMake templates with CMake 3.29
#945 #962 #966 CMake: Drop support for CMake <3.13
#942 CMake: Small fuzzing related improvements
#921 docs: Add missing documentation of error code
XML_ERROR_NOT_STARTED that was introduced with 2.6.4
#941 docs: Document need for C++11 compiler for use from C++
#959 tests/benchmark: Fix a (harmless) TOCTTOU
#944 Windows: Fix installer target location of file xmlwf.xml
for CMake
#953 Windows: Address warning -Wunknown-warning-option
about -Wno-pedantic-ms-format from LLVM MinGW
#971 Address Cppcheck warnings
#969 #970 Mass-migrate links from http:// to https://
#947 #958 ..
#974 #975 Document changes since the previous release
#974 #975 Version info bumped from 11:0:10 (libexpat*.so.1.10.0)
to 11:1:10 (libexpat*.so.1.10.1); see https://verbump.de/
for what these numbers do
Infrastructure:
#926 tests: Increase robustness
#927 #932 ..
#930 #933 tests: Increase test coverage
#617 #950 ..
#951 #952 ..
#954 #955 .. Fuzzing: Add new fuzzer "xml_lpm_fuzzer" based on
#961 Google's libprotobuf-mutator ("LPM")
#957 Fuzzing|CI: Start producing fuzzing code coverage reports
#936 CI: Pass -q -q for LCOV >=2.1 in coverage.sh
#942 CI: Small fuzzing related improvements
#139 #203 ..
#791 #946 CI: Make GitHub Actions build using MSVC on Windows and
produce 32bit and 64bit Windows binaries
#956 CI: Get off of about-to-be-removed Ubuntu 20.04
#960 #964 CI: Start uploading to Coverity Scan for static analysis
#972 CI: Stop loading DTD from the internet to address flaky CI
#971 CI: Adapt to breaking changes in Cppcheck
Special thanks to:
Alexander Gieringer
Berkay Eren Ürün
Hanno Böck
Jann Horn
Mark Brand
Sebastian Andrzej Siewior
Snild Dolkow
Thomas Pröll
Tomas Korbar
valord577
and
Google Project Zero
Linutronix
Red Hat
Siemens
Release 2.6.4 Wed November 6 2024
Security fixes:
#915 CVE-2024-50602 -- Fix crash within function XML_ResumeParser
@ -46,6 +163,8 @@ Release 2.6.4 Wed November 6 2024
#904 tests: Resolve duplicate handler
#317 #918 tests: Improve tests on doctype closing (ex CVE-2019-15903)
#914 Fix signedness of format strings
#915 For use from C++, expat.h started requiring C++11 due to
use of C99 features
#919 #920 Version info bumped from 10:3:9 (libexpat*.so.1.9.3)
to 11:0:10 (libexpat*.so.1.10.0); see https://verbump.de/
for what these numbers do

View file

@ -6,7 +6,7 @@
# \___/_/\_\ .__/ \__,_|\__|
# |_| XML parser
#
# Copyright (c) 2017-2023 Sebastian Pipping <sebastian@pipping.org>
# Copyright (c) 2017-2025 Sebastian Pipping <sebastian@pipping.org>
# Copyright (c) 2018 KangLin <kl222@126.com>
# Copyright (c) 2022 Johnny Jazeix <jazeix@gmail.com>
# Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com>
@ -96,6 +96,8 @@ EXTRA_DIST = \
conftools/expat.m4 \
conftools/get-version.sh \
\
fuzz/xml_lpm_fuzzer.cpp \
fuzz/xml_lpm_fuzzer.proto \
fuzz/xml_parsebuffer_fuzzer.c \
fuzz/xml_parse_fuzzer.c \
\

View file

@ -22,7 +22,7 @@
# \___/_/\_\ .__/ \__,_|\__|
# |_| XML parser
#
# Copyright (c) 2017-2023 Sebastian Pipping <sebastian@pipping.org>
# Copyright (c) 2017-2025 Sebastian Pipping <sebastian@pipping.org>
# Copyright (c) 2018 KangLin <kl222@126.com>
# Copyright (c) 2022 Johnny Jazeix <jazeix@gmail.com>
# Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com>
@ -494,6 +494,8 @@ EXTRA_DIST = \
conftools/expat.m4 \
conftools/get-version.sh \
\
fuzz/xml_lpm_fuzzer.cpp \
fuzz/xml_lpm_fuzzer.proto \
fuzz/xml_parsebuffer_fuzzer.c \
fuzz/xml_parse_fuzzer.c \
\

View file

@ -3,6 +3,7 @@
[![Packaging status](https://repology.org/badge/tiny-repos/expat.svg)](https://repology.org/metapackage/expat/versions)
[![Downloads SourceForge](https://img.shields.io/sourceforge/dt/expat?label=Downloads%20SourceForge)](https://sourceforge.net/projects/expat/files/)
[![Downloads GitHub](https://img.shields.io/github/downloads/libexpat/libexpat/total?label=Downloads%20GitHub)](https://github.com/libexpat/libexpat/releases)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10205/badge)](https://www.bestpractices.dev/projects/10205)
> [!CAUTION]
>
@ -11,7 +12,7 @@
> at the top of the `Changes` file.
# Expat, Release 2.6.4
# Expat, Release 2.7.1
This is Expat, a C99 library for parsing
[XML 1.0 Fourth Edition](https://www.w3.org/TR/2006/REC-xml-20060816/), started by
@ -22,9 +23,9 @@ are called when the parser discovers the associated structures in the
document being parsed. A start tag is an example of the kind of
structures for which you may register handlers.
Expat supports the following compilers:
Expat supports the following C99 compilers:
- GNU GCC >=4.5
- GNU GCC >=4.5 (for use from C) or GNU GCC >=4.8.1 (for use from C++)
- LLVM Clang >=3.5
- Microsoft Visual Studio >=16.0/2019 (rolling `${today} minus 5 years`)
@ -52,7 +53,7 @@ This approach leverages CMake's own [module `FindEXPAT`](https://cmake.org/cmake
Notice the *uppercase* `EXPAT` in the following example:
```cmake
cmake_minimum_required(VERSION 3.0) # or 3.10, see below
cmake_minimum_required(VERSION 3.10)
project(hello VERSION 1.0.0)
@ -62,12 +63,7 @@ add_executable(hello
hello.c
)
# a) for CMake >=3.10 (see CMake's FindEXPAT docs)
target_link_libraries(hello PUBLIC EXPAT::EXPAT)
# b) for CMake >=3.0
target_include_directories(hello PRIVATE ${EXPAT_INCLUDE_DIRS})
target_link_libraries(hello PUBLIC ${EXPAT_LIBRARIES})
```
### b) `find_package` with Config Mode
@ -85,7 +81,7 @@ or
Notice the *lowercase* `expat` in the following example:
```cmake
cmake_minimum_required(VERSION 3.0)
cmake_minimum_required(VERSION 3.10)
project(hello VERSION 1.0.0)
@ -295,7 +291,7 @@ EXPAT_ENABLE_INSTALL:BOOL=ON
// Use /MT flag (static CRT) when compiling in MSVC
EXPAT_MSVC_STATIC_CRT:BOOL=OFF
// Build fuzzers via ossfuzz for the expat library
// Build fuzzers via OSS-Fuzz for the expat library
EXPAT_OSSFUZZ_BUILD:BOOL=OFF
// Build a shared expat library

View file

@ -11,7 +11,7 @@ dnl Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net>
dnl Copyright (c) 2000-2005 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
dnl Copyright (c) 2001-2003 Greg Stein <gstein@users.sourceforge.net>
dnl Copyright (c) 2006-2012 Karl Waclawek <karl@waclawek.net>
dnl Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
dnl Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
dnl Copyright (c) 2017 S. P. Zeidler <spz@netbsd.org>
dnl Copyright (c) 2017 Stephen Groat <stephen@groat.us>
dnl Copyright (c) 2017-2020 Joe Orton <jorton@redhat.com>
@ -85,7 +85,7 @@ dnl If the API changes incompatibly set LIBAGE back to 0
dnl
LIBCURRENT=11 # sync
LIBREVISION=0 # with
LIBREVISION=2 # with
LIBAGE=10 # CMakeLists.txt!
AC_CONFIG_HEADERS([expat_config.h])

View file

@ -14,7 +14,7 @@
Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net>
Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2017-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net>
Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com>
Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
@ -52,7 +52,7 @@
<div>
<h1>
The Expat XML Parser
<small>Release 2.6.4</small>
<small>Release 2.7.1</small>
</h1>
</div>
<div class="content">
@ -1267,6 +1267,11 @@ call-backs, except when parsing an external parameter entity and
<code>XML_STATUS_ERROR</code> otherwise. The possible error codes
are:</p>
<dl>
<dt><code>XML_ERROR_NOT_STARTED</code></dt>
<dd>
when stopping or suspending a parser before it has started,
added in Expat 2.6.4.
</dd>
<dt><code>XML_ERROR_SUSPENDED</code></dt>
<dd>when suspending an already suspended parser.</dd>
<dt><code>XML_ERROR_FINISHED</code></dt>

View file

@ -5,7 +5,7 @@
\\$2 \(la\\$1\(ra\\$3
..
.if \n(.g .mso www.tmac
.TH XMLWF 1 "November 6, 2024" "" ""
.TH XMLWF 1 "March 27, 2025" "" ""
.SH NAME
xmlwf \- Determines if an XML document is well-formed
.SH SYNOPSIS

View file

@ -9,7 +9,7 @@
Copyright (c) 2001 Scott Bronson <bronson@rinspin.com>
Copyright (c) 2002-2003 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
Copyright (c) 2009 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016 Ardo van Rangelrooij <ardo@debian.org>
Copyright (c) 2017 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2020 Joe Orton <jorton@redhat.com>
@ -21,7 +21,7 @@
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
<!ENTITY dhfirstname "<firstname>Scott</firstname>">
<!ENTITY dhsurname "<surname>Bronson</surname>">
<!ENTITY dhdate "<date>November 6, 2024</date>">
<!ENTITY dhdate "<date>March 27, 2025</date>">
<!-- Please adjust this^^ date whenever cutting a new release. -->
<!ENTITY dhsection "<manvolnum>1</manvolnum>">
<!ENTITY dhemail "<email>bronson@rinspin.com</email>">

View file

@ -0,0 +1,464 @@
/*
__ __ _
___\ \/ /_ __ __ _| |_
/ _ \\ /| '_ \ / _` | __|
| __// \| |_) | (_| | |_
\___/_/\_\ .__/ \__,_|\__|
|_| XML parser
Copyright (c) 2022 Mark Brand <markbrand@google.com>
Copyright (c) 2025 Sebastian Pipping <sebastian@pipping.org>
Licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the
following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#if defined(NDEBUG)
# undef NDEBUG // because checks below rely on assert(...)
#endif
#include <assert.h>
#include <stdint.h>
#include <vector>
#include "expat.h"
#include "xml_lpm_fuzzer.pb.h"
#include "src/libfuzzer/libfuzzer_macro.h"
static const char *g_encoding = nullptr;
static const char *g_external_entity = nullptr;
static size_t g_external_entity_size = 0;
void
SetEncoding(const xml_lpm_fuzzer::Encoding &e) {
switch (e) {
case xml_lpm_fuzzer::Encoding::UTF8:
g_encoding = "UTF-8";
break;
case xml_lpm_fuzzer::Encoding::UTF16:
g_encoding = "UTF-16";
break;
case xml_lpm_fuzzer::Encoding::ISO88591:
g_encoding = "ISO-8859-1";
break;
case xml_lpm_fuzzer::Encoding::ASCII:
g_encoding = "US-ASCII";
break;
case xml_lpm_fuzzer::Encoding::NONE:
g_encoding = NULL;
break;
default:
g_encoding = "UNKNOWN";
break;
}
}
static int g_allocation_count = 0;
static std::vector<int> g_fail_allocations = {};
void *
MallocHook(size_t size) {
g_allocation_count += 1;
for (auto index : g_fail_allocations) {
if (index == g_allocation_count) {
return NULL;
}
}
return malloc(size);
}
void *
ReallocHook(void *ptr, size_t size) {
g_allocation_count += 1;
for (auto index : g_fail_allocations) {
if (index == g_allocation_count) {
return NULL;
}
}
return realloc(ptr, size);
}
void
FreeHook(void *ptr) {
free(ptr);
}
XML_Memory_Handling_Suite memory_handling_suite
= {MallocHook, ReallocHook, FreeHook};
void InitializeParser(XML_Parser parser);
// We want a parse function that supports resumption, so that we can cover the
// suspend/resume code.
enum XML_Status
Parse(XML_Parser parser, const char *input, int input_len, int is_final) {
enum XML_Status status = XML_Parse(parser, input, input_len, is_final);
while (status == XML_STATUS_SUSPENDED) {
status = XML_ResumeParser(parser);
}
return status;
}
// When the fuzzer is compiled with instrumentation such as ASan, then the
// accesses in TouchString will fault if they access invalid memory (ie. detect
// either a use-after-free or buffer-overflow). By calling TouchString in each
// of the callbacks, we can check that the arguments meet the API specifications
// in terms of length/null-termination. no_optimize is used to ensure that the
// compiler has to emit actual memory reads, instead of removing them.
static volatile size_t no_optimize = 0;
static void
TouchString(const XML_Char *ptr, int len = -1) {
if (! ptr) {
return;
}
if (len == -1) {
for (XML_Char value = *ptr++; value; value = *ptr++) {
no_optimize += value;
}
} else {
for (int i = 0; i < len; ++i) {
no_optimize += ptr[i];
}
}
}
static void
TouchNodeAndRecurse(XML_Content *content) {
switch (content->type) {
case XML_CTYPE_EMPTY:
case XML_CTYPE_ANY:
assert(content->quant == XML_CQUANT_NONE);
assert(content->name == NULL);
assert(content->numchildren == 0);
assert(content->children == NULL);
break;
case XML_CTYPE_MIXED:
assert(content->quant == XML_CQUANT_NONE
|| content->quant == XML_CQUANT_REP);
assert(content->name == NULL);
for (unsigned int i = 0; i < content->numchildren; ++i) {
assert(content->children[i].type == XML_CTYPE_NAME);
assert(content->children[i].quant == XML_CQUANT_NONE);
assert(content->children[i].numchildren == 0);
assert(content->children[i].children == NULL);
TouchString(content->children[i].name);
}
break;
case XML_CTYPE_NAME:
assert((content->quant == XML_CQUANT_NONE)
|| (content->quant == XML_CQUANT_OPT)
|| (content->quant == XML_CQUANT_REP)
|| (content->quant == XML_CQUANT_PLUS));
assert(content->numchildren == 0);
assert(content->children == NULL);
TouchString(content->name);
break;
case XML_CTYPE_CHOICE:
case XML_CTYPE_SEQ:
assert((content->quant == XML_CQUANT_NONE)
|| (content->quant == XML_CQUANT_OPT)
|| (content->quant == XML_CQUANT_REP)
|| (content->quant == XML_CQUANT_PLUS));
assert(content->name == NULL);
for (unsigned int i = 0; i < content->numchildren; ++i) {
TouchNodeAndRecurse(&content->children[i]);
}
break;
default:
assert(false);
}
}
static void XMLCALL
ElementDeclHandler(void *userData, const XML_Char *name, XML_Content *model) {
TouchString(name);
TouchNodeAndRecurse(model);
XML_FreeContentModel((XML_Parser)userData, model);
}
static void XMLCALL
AttlistDeclHandler(void *userData, const XML_Char *elname,
const XML_Char *attname, const XML_Char *atttype,
const XML_Char *dflt, int isrequired) {
(void)userData;
TouchString(elname);
TouchString(attname);
TouchString(atttype);
TouchString(dflt);
(void)isrequired;
}
static void XMLCALL
XmlDeclHandler(void *userData, const XML_Char *version,
const XML_Char *encoding, int standalone) {
(void)userData;
TouchString(version);
TouchString(encoding);
(void)standalone;
}
static void XMLCALL
StartElementHandler(void *userData, const XML_Char *name,
const XML_Char **atts) {
(void)userData;
TouchString(name);
for (size_t i = 0; atts[i] != NULL; ++i) {
TouchString(atts[i]);
}
}
static void XMLCALL
EndElementHandler(void *userData, const XML_Char *name) {
(void)userData;
TouchString(name);
}
static void XMLCALL
CharacterDataHandler(void *userData, const XML_Char *s, int len) {
(void)userData;
TouchString(s, len);
}
static void XMLCALL
ProcessingInstructionHandler(void *userData, const XML_Char *target,
const XML_Char *data) {
(void)userData;
TouchString(target);
TouchString(data);
}
static void XMLCALL
CommentHandler(void *userData, const XML_Char *data) {
TouchString(data);
// Use the comment handler to trigger parser suspend, so that we can get
// coverage of that code.
XML_StopParser((XML_Parser)userData, XML_TRUE);
}
static void XMLCALL
StartCdataSectionHandler(void *userData) {
(void)userData;
}
static void XMLCALL
EndCdataSectionHandler(void *userData) {
(void)userData;
}
static void XMLCALL
DefaultHandler(void *userData, const XML_Char *s, int len) {
(void)userData;
TouchString(s, len);
}
static void XMLCALL
StartDoctypeDeclHandler(void *userData, const XML_Char *doctypeName,
const XML_Char *sysid, const XML_Char *pubid,
int has_internal_subset) {
(void)userData;
TouchString(doctypeName);
TouchString(sysid);
TouchString(pubid);
(void)has_internal_subset;
}
static void XMLCALL
EndDoctypeDeclHandler(void *userData) {
(void)userData;
}
static void XMLCALL
EntityDeclHandler(void *userData, const XML_Char *entityName,
int is_parameter_entity, const XML_Char *value,
int value_length, const XML_Char *base,
const XML_Char *systemId, const XML_Char *publicId,
const XML_Char *notationName) {
(void)userData;
TouchString(entityName);
(void)is_parameter_entity;
TouchString(value, value_length);
TouchString(base);
TouchString(systemId);
TouchString(publicId);
TouchString(notationName);
}
static void XMLCALL
NotationDeclHandler(void *userData, const XML_Char *notationName,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId) {
(void)userData;
TouchString(notationName);
TouchString(base);
TouchString(systemId);
TouchString(publicId);
}
static void XMLCALL
StartNamespaceDeclHandler(void *userData, const XML_Char *prefix,
const XML_Char *uri) {
(void)userData;
TouchString(prefix);
TouchString(uri);
}
static void XMLCALL
EndNamespaceDeclHandler(void *userData, const XML_Char *prefix) {
(void)userData;
TouchString(prefix);
}
static int XMLCALL
NotStandaloneHandler(void *userData) {
(void)userData;
return XML_STATUS_OK;
}
static int XMLCALL
ExternalEntityRefHandler(XML_Parser parser, const XML_Char *context,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId) {
int rc = XML_STATUS_ERROR;
TouchString(context);
TouchString(base);
TouchString(systemId);
TouchString(publicId);
if (g_external_entity) {
XML_Parser ext_parser
= XML_ExternalEntityParserCreate(parser, context, g_encoding);
rc = Parse(ext_parser, g_external_entity, g_external_entity_size, 1);
XML_ParserFree(ext_parser);
}
return rc;
}
static void XMLCALL
SkippedEntityHandler(void *userData, const XML_Char *entityName,
int is_parameter_entity) {
(void)userData;
TouchString(entityName);
(void)is_parameter_entity;
}
static int XMLCALL
UnknownEncodingHandler(void *encodingHandlerData, const XML_Char *name,
XML_Encoding *info) {
(void)encodingHandlerData;
TouchString(name);
(void)info;
return XML_STATUS_ERROR;
}
void
InitializeParser(XML_Parser parser) {
XML_SetUserData(parser, (void *)parser);
XML_SetHashSalt(parser, 0x41414141);
XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
XML_SetElementDeclHandler(parser, ElementDeclHandler);
XML_SetAttlistDeclHandler(parser, AttlistDeclHandler);
XML_SetXmlDeclHandler(parser, XmlDeclHandler);
XML_SetElementHandler(parser, StartElementHandler, EndElementHandler);
XML_SetCharacterDataHandler(parser, CharacterDataHandler);
XML_SetProcessingInstructionHandler(parser, ProcessingInstructionHandler);
XML_SetCommentHandler(parser, CommentHandler);
XML_SetCdataSectionHandler(parser, StartCdataSectionHandler,
EndCdataSectionHandler);
// XML_SetDefaultHandler disables entity expansion
XML_SetDefaultHandlerExpand(parser, DefaultHandler);
XML_SetDoctypeDeclHandler(parser, StartDoctypeDeclHandler,
EndDoctypeDeclHandler);
// Note: This is mutually exclusive with XML_SetUnparsedEntityDeclHandler,
// and there isn't any significant code change between the two.
XML_SetEntityDeclHandler(parser, EntityDeclHandler);
XML_SetNotationDeclHandler(parser, NotationDeclHandler);
XML_SetNamespaceDeclHandler(parser, StartNamespaceDeclHandler,
EndNamespaceDeclHandler);
XML_SetNotStandaloneHandler(parser, NotStandaloneHandler);
XML_SetExternalEntityRefHandler(parser, ExternalEntityRefHandler);
XML_SetSkippedEntityHandler(parser, SkippedEntityHandler);
XML_SetUnknownEncodingHandler(parser, UnknownEncodingHandler, (void *)parser);
}
DEFINE_TEXT_PROTO_FUZZER(const xml_lpm_fuzzer::Testcase &testcase) {
g_external_entity = nullptr;
if (! testcase.actions_size()) {
return;
}
g_allocation_count = 0;
g_fail_allocations.clear();
for (int i = 0; i < testcase.fail_allocations_size(); ++i) {
g_fail_allocations.push_back(testcase.fail_allocations(i));
}
SetEncoding(testcase.encoding());
XML_Parser parser
= XML_ParserCreate_MM(g_encoding, &memory_handling_suite, "|");
InitializeParser(parser);
for (int i = 0; i < testcase.actions_size(); ++i) {
const auto &action = testcase.actions(i);
switch (action.action_case()) {
case xml_lpm_fuzzer::Action::kChunk:
if (XML_STATUS_ERROR
== Parse(parser, action.chunk().data(), action.chunk().size(), 0)) {
// Force a reset after parse error.
XML_ParserReset(parser, g_encoding);
InitializeParser(parser);
}
break;
case xml_lpm_fuzzer::Action::kLastChunk:
Parse(parser, action.last_chunk().data(), action.last_chunk().size(), 1);
XML_ParserReset(parser, g_encoding);
InitializeParser(parser);
break;
case xml_lpm_fuzzer::Action::kReset:
XML_ParserReset(parser, g_encoding);
InitializeParser(parser);
break;
case xml_lpm_fuzzer::Action::kExternalEntity:
g_external_entity = action.external_entity().data();
g_external_entity_size = action.external_entity().size();
break;
default:
break;
}
}
XML_ParserFree(parser);
}

View file

@ -0,0 +1,58 @@
/*
__ __ _
___\ \/ /_ __ __ _| |_
/ _ \\ /| '_ \ / _` | __|
| __// \| |_) | (_| | |_
\___/_/\_\ .__/ \__,_|\__|
|_| XML parser
Copyright (c) 2022 Mark Brand <markbrand@google.com>
Copyright (c) 2025 Sebastian Pipping <sebastian@pipping.org>
Licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the
following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
syntax = "proto2";
package xml_lpm_fuzzer;
enum Encoding {
UTF8 = 0;
UTF16 = 1;
ISO88591 = 2;
ASCII = 3;
UNKNOWN = 4;
NONE = 5;
}
message Action {
oneof action {
string chunk = 1;
string last_chunk = 2;
bool reset = 3;
string external_entity = 4;
}
}
message Testcase {
required Encoding encoding = 1;
repeated Action actions = 2;
repeated int32 fail_allocations = 3;
}

View file

@ -5,7 +5,7 @@
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,

View file

@ -5,7 +5,7 @@
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,

View file

@ -11,7 +11,7 @@
Copyright (c) 2000-2005 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
Copyright (c) 2001-2002 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2002-2016 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016 Cristian Rodríguez <crrodriguez@opensuse.org>
Copyright (c) 2016 Thomas Beutlich <tc@tbeu.de>
Copyright (c) 2017 Rhodri James <rhodri@wildebeest.org.uk>
@ -1067,8 +1067,8 @@ XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled);
See https://semver.org
*/
#define XML_MAJOR_VERSION 2
#define XML_MINOR_VERSION 6
#define XML_MICRO_VERSION 4
#define XML_MINOR_VERSION 7
#define XML_MICRO_VERSION 1
#ifdef __cplusplus
}

View file

@ -28,7 +28,7 @@
Copyright (c) 2002-2003 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
Copyright (c) 2002-2006 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2003 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2018 Yury Gribov <tetra2005@gmail.com>
Copyright (c) 2019 David Loffredo <loffredo@steptools.com>
Copyright (c) 2023-2024 Sony Corporation / Snild Dolkow <snild@sony.com>
@ -127,6 +127,9 @@
# elif ULONG_MAX == 18446744073709551615u // 2^64-1
# define EXPAT_FMT_PTRDIFF_T(midpart) "%" midpart "ld"
# define EXPAT_FMT_SIZE_T(midpart) "%" midpart "lu"
# elif defined(EMSCRIPTEN) // 32bit mode Emscripten
# define EXPAT_FMT_PTRDIFF_T(midpart) "%" midpart "ld"
# define EXPAT_FMT_SIZE_T(midpart) "%" midpart "zu"
# else
# define EXPAT_FMT_PTRDIFF_T(midpart) "%" midpart "d"
# define EXPAT_FMT_SIZE_T(midpart) "%" midpart "u"

View file

@ -1,4 +1,4 @@
/* c5625880f4bf417c1463deee4eb92d86ff413f802048621c57e25fe483eb59e4 (2.6.4+)
/* d19ae032c224863c1527ba44d228cc34b99192c3a4c5a27af1f4e054d45ee031 (2.7.1+)
__ __ _
___\ \/ /_ __ __ _| |_
/ _ \\ /| '_ \ / _` | __|
@ -13,7 +13,7 @@
Copyright (c) 2002-2016 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2005-2009 Steven Solie <steven@solie.ca>
Copyright (c) 2016 Eric Rahm <erahm@mozilla.com>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016 Gaurav <g.gupta@samsung.com>
Copyright (c) 2016 Thomas Beutlich <tc@tbeu.de>
Copyright (c) 2016 Gustavo Grieco <gustavo.grieco@imag.fr>
@ -39,7 +39,7 @@
Copyright (c) 2022 Sean McBride <sean@rogue-research.com>
Copyright (c) 2023 Owain Davies <owaind@bath.edu>
Copyright (c) 2023-2024 Sony Corporation / Snild Dolkow <snild@sony.com>
Copyright (c) 2024 Berkay Eren Ürün <berkay.ueruen@siemens.com>
Copyright (c) 2024-2025 Berkay Eren Ürün <berkay.ueruen@siemens.com>
Copyright (c) 2024 Hanno Böck <hanno@gentoo.org>
Licensed under the MIT license:
@ -325,6 +325,10 @@ typedef struct {
const XML_Char *publicId;
const XML_Char *notation;
XML_Bool open;
XML_Bool hasMore; /* true if entity has not been completely processed */
/* An entity can be open while being already completely processed (hasMore ==
XML_FALSE). The reason is the delayed closing of entities until their inner
entities are processed and closed */
XML_Bool is_param;
XML_Bool is_internal; /* true if declared in internal subset outside PE */
} ENTITY;
@ -415,6 +419,12 @@ typedef struct {
int *scaffIndex;
} DTD;
enum EntityType {
ENTITY_INTERNAL,
ENTITY_ATTRIBUTE,
ENTITY_VALUE,
};
typedef struct open_internal_entity {
const char *internalEventPtr;
const char *internalEventEndPtr;
@ -422,6 +432,7 @@ typedef struct open_internal_entity {
ENTITY *entity;
int startTagLevel;
XML_Bool betweenDecl; /* WFC: PE Between Declarations */
enum EntityType type;
} OPEN_INTERNAL_ENTITY;
enum XML_Account {
@ -481,8 +492,8 @@ static enum XML_Error doProlog(XML_Parser parser, const ENCODING *enc,
const char *next, const char **nextPtr,
XML_Bool haveMore, XML_Bool allowClosingDoctype,
enum XML_Account account);
static enum XML_Error processInternalEntity(XML_Parser parser, ENTITY *entity,
XML_Bool betweenDecl);
static enum XML_Error processEntity(XML_Parser parser, ENTITY *entity,
XML_Bool betweenDecl, enum EntityType type);
static enum XML_Error doContent(XML_Parser parser, int startTagLevel,
const ENCODING *enc, const char *start,
const char *end, const char **endPtr,
@ -513,18 +524,22 @@ static enum XML_Error storeAttributeValue(XML_Parser parser,
const char *ptr, const char *end,
STRING_POOL *pool,
enum XML_Account account);
static enum XML_Error appendAttributeValue(XML_Parser parser,
const ENCODING *enc,
XML_Bool isCdata, const char *ptr,
const char *end, STRING_POOL *pool,
enum XML_Account account);
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
const char *ptr, const char *end, STRING_POOL *pool,
enum XML_Account account, const char **nextPtr);
static ATTRIBUTE_ID *getAttributeId(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end);
static int setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *elementType);
#if XML_GE == 1
static enum XML_Error storeEntityValue(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end,
enum XML_Account account);
enum XML_Account account,
const char **nextPtr);
static enum XML_Error callStoreEntityValue(XML_Parser parser,
const ENCODING *enc,
const char *start, const char *end,
enum XML_Account account);
#else
static enum XML_Error storeSelfEntityValue(XML_Parser parser, ENTITY *entity);
#endif
@ -709,6 +724,10 @@ struct XML_ParserStruct {
const char *m_positionPtr;
OPEN_INTERNAL_ENTITY *m_openInternalEntities;
OPEN_INTERNAL_ENTITY *m_freeInternalEntities;
OPEN_INTERNAL_ENTITY *m_openAttributeEntities;
OPEN_INTERNAL_ENTITY *m_freeAttributeEntities;
OPEN_INTERNAL_ENTITY *m_openValueEntities;
OPEN_INTERNAL_ENTITY *m_freeValueEntities;
XML_Bool m_defaultExpandInternalEntities;
int m_tagLevel;
ENTITY *m_declEntity;
@ -756,6 +775,7 @@ struct XML_ParserStruct {
ACCOUNTING m_accounting;
ENTITY_STATS m_entity_stats;
#endif
XML_Bool m_reenter;
};
#define MALLOC(parser, s) (parser->m_mem.malloc_fcn((s)))
@ -1028,7 +1048,29 @@ callProcessor(XML_Parser parser, const char *start, const char *end,
#if defined(XML_TESTING)
g_bytesScanned += (unsigned)have_now;
#endif
const enum XML_Error ret = parser->m_processor(parser, start, end, endPtr);
// Run in a loop to eliminate dangerous recursion depths
enum XML_Error ret;
*endPtr = start;
while (1) {
// Use endPtr as the new start in each iteration, since it will
// be set to the next start point by m_processor.
ret = parser->m_processor(parser, *endPtr, end, endPtr);
// Make parsing status (and in particular XML_SUSPENDED) take
// precedence over re-enter flag when they disagree
if (parser->m_parsingStatus.parsing != XML_PARSING) {
parser->m_reenter = XML_FALSE;
}
if (! parser->m_reenter) {
break;
}
parser->m_reenter = XML_FALSE;
if (ret != XML_ERROR_NONE)
return ret;
}
if (ret == XML_ERROR_NONE) {
// if we consumed nothing, remember what we had on this parse attempt.
if (*endPtr == start) {
@ -1139,6 +1181,8 @@ parserCreate(const XML_Char *encodingName,
parser->m_freeBindingList = NULL;
parser->m_freeTagList = NULL;
parser->m_freeInternalEntities = NULL;
parser->m_freeAttributeEntities = NULL;
parser->m_freeValueEntities = NULL;
parser->m_groupSize = 0;
parser->m_groupConnector = NULL;
@ -1241,6 +1285,8 @@ parserInit(XML_Parser parser, const XML_Char *encodingName) {
parser->m_eventEndPtr = NULL;
parser->m_positionPtr = NULL;
parser->m_openInternalEntities = NULL;
parser->m_openAttributeEntities = NULL;
parser->m_openValueEntities = NULL;
parser->m_defaultExpandInternalEntities = XML_TRUE;
parser->m_tagLevel = 0;
parser->m_tagStack = NULL;
@ -1251,6 +1297,8 @@ parserInit(XML_Parser parser, const XML_Char *encodingName) {
parser->m_unknownEncodingData = NULL;
parser->m_parentParser = NULL;
parser->m_parsingStatus.parsing = XML_INITIALIZED;
// Reentry can only be triggered inside m_processor calls
parser->m_reenter = XML_FALSE;
#ifdef XML_DTD
parser->m_isParamEntity = XML_FALSE;
parser->m_useForeignDTD = XML_FALSE;
@ -1310,6 +1358,24 @@ XML_ParserReset(XML_Parser parser, const XML_Char *encodingName) {
openEntity->next = parser->m_freeInternalEntities;
parser->m_freeInternalEntities = openEntity;
}
/* move m_openAttributeEntities to m_freeAttributeEntities (i.e. same task but
* for attributes) */
openEntityList = parser->m_openAttributeEntities;
while (openEntityList) {
OPEN_INTERNAL_ENTITY *openEntity = openEntityList;
openEntityList = openEntity->next;
openEntity->next = parser->m_freeAttributeEntities;
parser->m_freeAttributeEntities = openEntity;
}
/* move m_openValueEntities to m_freeValueEntities (i.e. same task but
* for value entities) */
openEntityList = parser->m_openValueEntities;
while (openEntityList) {
OPEN_INTERNAL_ENTITY *openEntity = openEntityList;
openEntityList = openEntity->next;
openEntity->next = parser->m_freeValueEntities;
parser->m_freeValueEntities = openEntity;
}
moveToFreeBindingList(parser, parser->m_inheritedBindings);
FREE(parser, parser->m_unknownEncodingMem);
if (parser->m_unknownEncodingRelease)
@ -1323,6 +1389,19 @@ XML_ParserReset(XML_Parser parser, const XML_Char *encodingName) {
return XML_TRUE;
}
static XML_Bool
parserBusy(XML_Parser parser) {
switch (parser->m_parsingStatus.parsing) {
case XML_PARSING:
case XML_SUSPENDED:
return XML_TRUE;
case XML_INITIALIZED:
case XML_FINISHED:
default:
return XML_FALSE;
}
}
enum XML_Status XMLCALL
XML_SetEncoding(XML_Parser parser, const XML_Char *encodingName) {
if (parser == NULL)
@ -1331,8 +1410,7 @@ XML_SetEncoding(XML_Parser parser, const XML_Char *encodingName) {
XXX There's no way for the caller to determine which of the
XXX possible error cases caused the XML_STATUS_ERROR return.
*/
if (parser->m_parsingStatus.parsing == XML_PARSING
|| parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parserBusy(parser))
return XML_STATUS_ERROR;
/* Get rid of any previous encoding name */
@ -1569,7 +1647,34 @@ XML_ParserFree(XML_Parser parser) {
entityList = entityList->next;
FREE(parser, openEntity);
}
/* free m_openAttributeEntities and m_freeAttributeEntities */
entityList = parser->m_openAttributeEntities;
for (;;) {
OPEN_INTERNAL_ENTITY *openEntity;
if (entityList == NULL) {
if (parser->m_freeAttributeEntities == NULL)
break;
entityList = parser->m_freeAttributeEntities;
parser->m_freeAttributeEntities = NULL;
}
openEntity = entityList;
entityList = entityList->next;
FREE(parser, openEntity);
}
/* free m_openValueEntities and m_freeValueEntities */
entityList = parser->m_openValueEntities;
for (;;) {
OPEN_INTERNAL_ENTITY *openEntity;
if (entityList == NULL) {
if (parser->m_freeValueEntities == NULL)
break;
entityList = parser->m_freeValueEntities;
parser->m_freeValueEntities = NULL;
}
openEntity = entityList;
entityList = entityList->next;
FREE(parser, openEntity);
}
destroyBindings(parser->m_freeBindingList, parser);
destroyBindings(parser->m_inheritedBindings, parser);
poolDestroy(&parser->m_tempPool);
@ -1611,8 +1716,7 @@ XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD) {
return XML_ERROR_INVALID_ARGUMENT;
#ifdef XML_DTD
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (parser->m_parsingStatus.parsing == XML_PARSING
|| parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parserBusy(parser))
return XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING;
parser->m_useForeignDTD = useDTD;
return XML_ERROR_NONE;
@ -1627,8 +1731,7 @@ XML_SetReturnNSTriplet(XML_Parser parser, int do_nst) {
if (parser == NULL)
return;
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (parser->m_parsingStatus.parsing == XML_PARSING
|| parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parserBusy(parser))
return;
parser->m_ns_triplets = do_nst ? XML_TRUE : XML_FALSE;
}
@ -1897,8 +2000,7 @@ XML_SetParamEntityParsing(XML_Parser parser,
if (parser == NULL)
return 0;
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (parser->m_parsingStatus.parsing == XML_PARSING
|| parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parserBusy(parser))
return 0;
#ifdef XML_DTD
parser->m_paramEntityParsing = peParsing;
@ -1915,8 +2017,7 @@ XML_SetHashSalt(XML_Parser parser, unsigned long hash_salt) {
if (parser->m_parentParser)
return XML_SetHashSalt(parser->m_parentParser, hash_salt);
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (parser->m_parsingStatus.parsing == XML_PARSING
|| parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parserBusy(parser))
return 0;
parser->m_hash_secret_salt = hash_salt;
return 1;
@ -2230,6 +2331,11 @@ XML_GetBuffer(XML_Parser parser, int len) {
return parser->m_bufferEnd;
}
static void
triggerReenter(XML_Parser parser) {
parser->m_reenter = XML_TRUE;
}
enum XML_Status XMLCALL
XML_StopParser(XML_Parser parser, XML_Bool resumable) {
if (parser == NULL)
@ -2704,8 +2810,9 @@ static enum XML_Error PTRCALL
contentProcessor(XML_Parser parser, const char *start, const char *end,
const char **endPtr) {
enum XML_Error result = doContent(
parser, 0, parser->m_encoding, start, end, endPtr,
(XML_Bool)! parser->m_parsingStatus.finalBuffer, XML_ACCOUNT_DIRECT);
parser, parser->m_parentParser ? 1 : 0, parser->m_encoding, start, end,
endPtr, (XML_Bool)! parser->m_parsingStatus.finalBuffer,
XML_ACCOUNT_DIRECT);
if (result == XML_ERROR_NONE) {
if (! storeRawNames(parser))
return XML_ERROR_NO_MEMORY;
@ -2793,6 +2900,11 @@ externalEntityInitProcessor3(XML_Parser parser, const char *start,
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
case XML_PARSING:
if (parser->m_reenter) {
return XML_ERROR_UNEXPECTED_STATE; // LCOV_EXCL_LINE
}
/* Fall through */
default:
start = next;
}
@ -2966,7 +3078,7 @@ doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
reportDefault(parser, enc, s, next);
break;
}
result = processInternalEntity(parser, entity, XML_FALSE);
result = processEntity(parser, entity, XML_FALSE, ENTITY_INTERNAL);
if (result != XML_ERROR_NONE)
return result;
} else if (parser->m_externalEntityRefHandler) {
@ -3092,7 +3204,9 @@ doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
}
if ((parser->m_tagLevel == 0)
&& (parser->m_parsingStatus.parsing != XML_FINISHED)) {
if (parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parser->m_parsingStatus.parsing == XML_SUSPENDED
|| (parser->m_parsingStatus.parsing == XML_PARSING
&& parser->m_reenter))
parser->m_processor = epilogProcessor;
else
return epilogProcessor(parser, next, end, nextPtr);
@ -3153,7 +3267,9 @@ doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
}
if ((parser->m_tagLevel == 0)
&& (parser->m_parsingStatus.parsing != XML_FINISHED)) {
if (parser->m_parsingStatus.parsing == XML_SUSPENDED)
if (parser->m_parsingStatus.parsing == XML_SUSPENDED
|| (parser->m_parsingStatus.parsing == XML_PARSING
&& parser->m_reenter))
parser->m_processor = epilogProcessor;
else
return epilogProcessor(parser, next, end, nextPtr);
@ -3286,14 +3402,22 @@ doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
break;
/* LCOV_EXCL_STOP */
}
*eventPP = s = next;
switch (parser->m_parsingStatus.parsing) {
case XML_SUSPENDED:
*eventPP = next;
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
*eventPP = next;
return XML_ERROR_ABORTED;
case XML_PARSING:
if (parser->m_reenter) {
*nextPtr = next;
return XML_ERROR_NONE;
}
/* Fall through */
default:;
*eventPP = s = next;
}
}
/* not reached */
@ -4210,14 +4334,21 @@ doCdataSection(XML_Parser parser, const ENCODING *enc, const char **startPtr,
/* LCOV_EXCL_STOP */
}
*eventPP = s = next;
switch (parser->m_parsingStatus.parsing) {
case XML_SUSPENDED:
*eventPP = next;
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
*eventPP = next;
return XML_ERROR_ABORTED;
case XML_PARSING:
if (parser->m_reenter) {
return XML_ERROR_UNEXPECTED_STATE; // LCOV_EXCL_LINE
}
/* Fall through */
default:;
*eventPP = s = next;
}
}
/* not reached */
@ -4549,7 +4680,7 @@ entityValueInitProcessor(XML_Parser parser, const char *s, const char *end,
}
/* found end of entity value - can store it now */
return storeEntityValue(parser, parser->m_encoding, s, end,
XML_ACCOUNT_DIRECT);
XML_ACCOUNT_DIRECT, NULL);
} else if (tok == XML_TOK_XML_DECL) {
enum XML_Error result;
result = processXmlDecl(parser, 0, start, next);
@ -4676,7 +4807,7 @@ entityValueProcessor(XML_Parser parser, const char *s, const char *end,
break;
}
/* found end of entity value - can store it now */
return storeEntityValue(parser, enc, s, end, XML_ACCOUNT_DIRECT);
return storeEntityValue(parser, enc, s, end, XML_ACCOUNT_DIRECT, NULL);
}
start = next;
}
@ -5119,9 +5250,9 @@ doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end,
#if XML_GE == 1
// This will store the given replacement text in
// parser->m_declEntity->textPtr.
enum XML_Error result
= storeEntityValue(parser, enc, s + enc->minBytesPerChar,
next - enc->minBytesPerChar, XML_ACCOUNT_NONE);
enum XML_Error result = callStoreEntityValue(
parser, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar,
XML_ACCOUNT_NONE);
if (parser->m_declEntity) {
parser->m_declEntity->textPtr = poolStart(&dtd->entityValuePool);
parser->m_declEntity->textLen
@ -5546,7 +5677,7 @@ doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end,
enum XML_Error result;
XML_Bool betweenDecl
= (role == XML_ROLE_PARAM_ENTITY_REF ? XML_TRUE : XML_FALSE);
result = processInternalEntity(parser, entity, betweenDecl);
result = processEntity(parser, entity, betweenDecl, ENTITY_INTERNAL);
if (result != XML_ERROR_NONE)
return result;
handleDefault = XML_FALSE;
@ -5751,6 +5882,12 @@ doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end,
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
case XML_PARSING:
if (parser->m_reenter) {
*nextPtr = next;
return XML_ERROR_NONE;
}
/* Fall through */
default:
s = next;
tok = XmlPrologTok(enc, s, end, &next);
@ -5818,28 +5955,58 @@ epilogProcessor(XML_Parser parser, const char *s, const char *end,
default:
return XML_ERROR_JUNK_AFTER_DOC_ELEMENT;
}
parser->m_eventPtr = s = next;
switch (parser->m_parsingStatus.parsing) {
case XML_SUSPENDED:
parser->m_eventPtr = next;
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
parser->m_eventPtr = next;
return XML_ERROR_ABORTED;
case XML_PARSING:
if (parser->m_reenter) {
return XML_ERROR_UNEXPECTED_STATE; // LCOV_EXCL_LINE
}
/* Fall through */
default:;
parser->m_eventPtr = s = next;
}
}
}
static enum XML_Error
processInternalEntity(XML_Parser parser, ENTITY *entity, XML_Bool betweenDecl) {
const char *textStart, *textEnd;
const char *next;
enum XML_Error result;
OPEN_INTERNAL_ENTITY *openEntity;
processEntity(XML_Parser parser, ENTITY *entity, XML_Bool betweenDecl,
enum EntityType type) {
OPEN_INTERNAL_ENTITY *openEntity, **openEntityList, **freeEntityList;
switch (type) {
case ENTITY_INTERNAL:
parser->m_processor = internalEntityProcessor;
openEntityList = &parser->m_openInternalEntities;
freeEntityList = &parser->m_freeInternalEntities;
break;
case ENTITY_ATTRIBUTE:
openEntityList = &parser->m_openAttributeEntities;
freeEntityList = &parser->m_freeAttributeEntities;
break;
case ENTITY_VALUE:
openEntityList = &parser->m_openValueEntities;
freeEntityList = &parser->m_freeValueEntities;
break;
/* default case serves merely as a safety net in case of a
* wrong entityType. Therefore we exclude the following lines
* from the test coverage.
*
* LCOV_EXCL_START
*/
default:
// Should not reach here
assert(0);
/* LCOV_EXCL_STOP */
}
if (parser->m_freeInternalEntities) {
openEntity = parser->m_freeInternalEntities;
parser->m_freeInternalEntities = openEntity->next;
if (*freeEntityList) {
openEntity = *freeEntityList;
*freeEntityList = openEntity->next;
} else {
openEntity
= (OPEN_INTERNAL_ENTITY *)MALLOC(parser, sizeof(OPEN_INTERNAL_ENTITY));
@ -5847,55 +6014,34 @@ processInternalEntity(XML_Parser parser, ENTITY *entity, XML_Bool betweenDecl) {
return XML_ERROR_NO_MEMORY;
}
entity->open = XML_TRUE;
entity->hasMore = XML_TRUE;
#if XML_GE == 1
entityTrackingOnOpen(parser, entity, __LINE__);
#endif
entity->processed = 0;
openEntity->next = parser->m_openInternalEntities;
parser->m_openInternalEntities = openEntity;
openEntity->next = *openEntityList;
*openEntityList = openEntity;
openEntity->entity = entity;
openEntity->type = type;
openEntity->startTagLevel = parser->m_tagLevel;
openEntity->betweenDecl = betweenDecl;
openEntity->internalEventPtr = NULL;
openEntity->internalEventEndPtr = NULL;
textStart = (const char *)entity->textPtr;
textEnd = (const char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
next = textStart;
if (entity->is_param) {
int tok
= XmlPrologTok(parser->m_internalEncoding, textStart, textEnd, &next);
result = doProlog(parser, parser->m_internalEncoding, textStart, textEnd,
tok, next, &next, XML_FALSE, XML_FALSE,
XML_ACCOUNT_ENTITY_EXPANSION);
} else {
result = doContent(parser, parser->m_tagLevel, parser->m_internalEncoding,
textStart, textEnd, &next, XML_FALSE,
XML_ACCOUNT_ENTITY_EXPANSION);
// Only internal entities make use of the reenter flag
// therefore no need to set it for other entity types
if (type == ENTITY_INTERNAL) {
triggerReenter(parser);
}
if (result == XML_ERROR_NONE) {
if (textEnd != next && parser->m_parsingStatus.parsing == XML_SUSPENDED) {
entity->processed = (int)(next - textStart);
parser->m_processor = internalEntityProcessor;
} else if (parser->m_openInternalEntities->entity == entity) {
#if XML_GE == 1
entityTrackingOnClose(parser, entity, __LINE__);
#endif /* XML_GE == 1 */
entity->open = XML_FALSE;
parser->m_openInternalEntities = openEntity->next;
/* put openEntity back in list of free instances */
openEntity->next = parser->m_freeInternalEntities;
parser->m_freeInternalEntities = openEntity;
}
}
return result;
return XML_ERROR_NONE;
}
static enum XML_Error PTRCALL
internalEntityProcessor(XML_Parser parser, const char *s, const char *end,
const char **nextPtr) {
UNUSED_P(s);
UNUSED_P(end);
UNUSED_P(nextPtr);
ENTITY *entity;
const char *textStart, *textEnd;
const char *next;
@ -5905,68 +6051,67 @@ internalEntityProcessor(XML_Parser parser, const char *s, const char *end,
return XML_ERROR_UNEXPECTED_STATE;
entity = openEntity->entity;
textStart = ((const char *)entity->textPtr) + entity->processed;
textEnd = (const char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
next = textStart;
if (entity->is_param) {
int tok
= XmlPrologTok(parser->m_internalEncoding, textStart, textEnd, &next);
result = doProlog(parser, parser->m_internalEncoding, textStart, textEnd,
tok, next, &next, XML_FALSE, XML_TRUE,
XML_ACCOUNT_ENTITY_EXPANSION);
} else {
result = doContent(parser, openEntity->startTagLevel,
parser->m_internalEncoding, textStart, textEnd, &next,
XML_FALSE, XML_ACCOUNT_ENTITY_EXPANSION);
}
// This will return early
if (entity->hasMore) {
textStart = ((const char *)entity->textPtr) + entity->processed;
textEnd = (const char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
next = textStart;
if (result != XML_ERROR_NONE)
if (entity->is_param) {
int tok
= XmlPrologTok(parser->m_internalEncoding, textStart, textEnd, &next);
result = doProlog(parser, parser->m_internalEncoding, textStart, textEnd,
tok, next, &next, XML_FALSE, XML_FALSE,
XML_ACCOUNT_ENTITY_EXPANSION);
} else {
result = doContent(parser, openEntity->startTagLevel,
parser->m_internalEncoding, textStart, textEnd, &next,
XML_FALSE, XML_ACCOUNT_ENTITY_EXPANSION);
}
if (result != XML_ERROR_NONE)
return result;
// Check if entity is complete, if not, mark down how much of it is
// processed
if (textEnd != next
&& (parser->m_parsingStatus.parsing == XML_SUSPENDED
|| (parser->m_parsingStatus.parsing == XML_PARSING
&& parser->m_reenter))) {
entity->processed = (int)(next - (const char *)entity->textPtr);
return result;
}
// Entity is complete. We cannot close it here since we need to first
// process its possible inner entities (which are added to the
// m_openInternalEntities during doProlog or doContent calls above)
entity->hasMore = XML_FALSE;
triggerReenter(parser);
return result;
} // End of entity processing, "if" block will return here
if (textEnd != next && parser->m_parsingStatus.parsing == XML_SUSPENDED) {
entity->processed = (int)(next - (const char *)entity->textPtr);
return result;
}
// Remove fully processed openEntity from open entity list.
#if XML_GE == 1
entityTrackingOnClose(parser, entity, __LINE__);
#endif
// openEntity is m_openInternalEntities' head, as we set it at the start of
// this function and we skipped doProlog and doContent calls with hasMore set
// to false. This means we can directly remove the head of
// m_openInternalEntities
assert(parser->m_openInternalEntities == openEntity);
entity->open = XML_FALSE;
parser->m_openInternalEntities = openEntity->next;
parser->m_openInternalEntities = parser->m_openInternalEntities->next;
/* put openEntity back in list of free instances */
openEntity->next = parser->m_freeInternalEntities;
parser->m_freeInternalEntities = openEntity;
// If there are more open entities we want to stop right here and have the
// upcoming call to XML_ResumeParser continue with entity content, or it would
// be ignored altogether.
if (parser->m_openInternalEntities != NULL
&& parser->m_parsingStatus.parsing == XML_SUSPENDED) {
return XML_ERROR_NONE;
}
if (entity->is_param) {
int tok;
parser->m_processor = prologProcessor;
tok = XmlPrologTok(parser->m_encoding, s, end, &next);
return doProlog(parser, parser->m_encoding, s, end, tok, next, nextPtr,
(XML_Bool)! parser->m_parsingStatus.finalBuffer, XML_TRUE,
XML_ACCOUNT_DIRECT);
} else {
parser->m_processor = contentProcessor;
/* see externalEntityContentProcessor vs contentProcessor */
result = doContent(parser, parser->m_parentParser ? 1 : 0,
parser->m_encoding, s, end, nextPtr,
(XML_Bool)! parser->m_parsingStatus.finalBuffer,
XML_ACCOUNT_DIRECT);
if (result == XML_ERROR_NONE) {
if (! storeRawNames(parser))
return XML_ERROR_NO_MEMORY;
}
return result;
if (parser->m_openInternalEntities == NULL) {
parser->m_processor = entity->is_param ? prologProcessor : contentProcessor;
}
triggerReenter(parser);
return XML_ERROR_NONE;
}
static enum XML_Error PTRCALL
@ -5982,8 +6127,70 @@ static enum XML_Error
storeAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
const char *ptr, const char *end, STRING_POOL *pool,
enum XML_Account account) {
enum XML_Error result
= appendAttributeValue(parser, enc, isCdata, ptr, end, pool, account);
const char *next = ptr;
enum XML_Error result = XML_ERROR_NONE;
while (1) {
if (! parser->m_openAttributeEntities) {
result = appendAttributeValue(parser, enc, isCdata, next, end, pool,
account, &next);
} else {
OPEN_INTERNAL_ENTITY *const openEntity = parser->m_openAttributeEntities;
if (! openEntity)
return XML_ERROR_UNEXPECTED_STATE;
ENTITY *const entity = openEntity->entity;
const char *const textStart
= ((const char *)entity->textPtr) + entity->processed;
const char *const textEnd
= (const char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
const char *nextInEntity = textStart;
if (entity->hasMore) {
result = appendAttributeValue(
parser, parser->m_internalEncoding, isCdata, textStart, textEnd,
pool, XML_ACCOUNT_ENTITY_EXPANSION, &nextInEntity);
if (result != XML_ERROR_NONE)
break;
// Check if entity is complete, if not, mark down how much of it is
// processed. A XML_SUSPENDED check here is not required as
// appendAttributeValue will never suspend the parser.
if (textEnd != nextInEntity) {
entity->processed
= (int)(nextInEntity - (const char *)entity->textPtr);
continue;
}
// Entity is complete. We cannot close it here since we need to first
// process its possible inner entities (which are added to the
// m_openAttributeEntities during appendAttributeValue)
entity->hasMore = XML_FALSE;
continue;
} // End of entity processing, "if" block skips the rest
// Remove fully processed openEntity from open entity list.
#if XML_GE == 1
entityTrackingOnClose(parser, entity, __LINE__);
#endif
// openEntity is m_openAttributeEntities' head, since we set it at the
// start of this function and because we skipped appendAttributeValue call
// with hasMore set to false. This means we can directly remove the head
// of m_openAttributeEntities
assert(parser->m_openAttributeEntities == openEntity);
entity->open = XML_FALSE;
parser->m_openAttributeEntities = parser->m_openAttributeEntities->next;
/* put openEntity back in list of free instances */
openEntity->next = parser->m_freeAttributeEntities;
parser->m_freeAttributeEntities = openEntity;
}
// Break if an error occurred or there is nothing left to process
if (result || (parser->m_openAttributeEntities == NULL && end == next)) {
break;
}
}
if (result)
return result;
if (! isCdata && poolLength(pool) && poolLastChar(pool) == 0x20)
@ -5996,7 +6203,7 @@ storeAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
const char *ptr, const char *end, STRING_POOL *pool,
enum XML_Account account) {
enum XML_Account account, const char **nextPtr) {
DTD *const dtd = parser->m_dtd; /* save one level of indirection */
#ifndef XML_DTD
UNUSED_P(account);
@ -6014,6 +6221,9 @@ appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
#endif
switch (tok) {
case XML_TOK_NONE:
if (nextPtr) {
*nextPtr = next;
}
return XML_ERROR_NONE;
case XML_TOK_INVALID:
if (enc == parser->m_encoding)
@ -6154,21 +6364,11 @@ appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF;
} else {
enum XML_Error result;
const XML_Char *textEnd = entity->textPtr + entity->textLen;
entity->open = XML_TRUE;
#if XML_GE == 1
entityTrackingOnOpen(parser, entity, __LINE__);
#endif
result = appendAttributeValue(parser, parser->m_internalEncoding,
isCdata, (const char *)entity->textPtr,
(const char *)textEnd, pool,
XML_ACCOUNT_ENTITY_EXPANSION);
#if XML_GE == 1
entityTrackingOnClose(parser, entity, __LINE__);
#endif
entity->open = XML_FALSE;
if (result)
return result;
result = processEntity(parser, entity, XML_FALSE, ENTITY_ATTRIBUTE);
if ((result == XML_ERROR_NONE) && (nextPtr != NULL)) {
*nextPtr = next;
}
return result;
}
} break;
default:
@ -6197,7 +6397,7 @@ appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
static enum XML_Error
storeEntityValue(XML_Parser parser, const ENCODING *enc,
const char *entityTextPtr, const char *entityTextEnd,
enum XML_Account account) {
enum XML_Account account, const char **nextPtr) {
DTD *const dtd = parser->m_dtd; /* save one level of indirection */
STRING_POOL *pool = &(dtd->entityValuePool);
enum XML_Error result = XML_ERROR_NONE;
@ -6215,8 +6415,9 @@ storeEntityValue(XML_Parser parser, const ENCODING *enc,
return XML_ERROR_NO_MEMORY;
}
const char *next;
for (;;) {
const char *next
next
= entityTextPtr; /* XmlEntityValueTok doesn't always set the last arg */
int tok = XmlEntityValueTok(enc, entityTextPtr, entityTextEnd, &next);
@ -6278,16 +6479,8 @@ storeEntityValue(XML_Parser parser, const ENCODING *enc,
} else
dtd->keepProcessing = dtd->standalone;
} else {
entity->open = XML_TRUE;
entityTrackingOnOpen(parser, entity, __LINE__);
result = storeEntityValue(
parser, parser->m_internalEncoding, (const char *)entity->textPtr,
(const char *)(entity->textPtr + entity->textLen),
XML_ACCOUNT_ENTITY_EXPANSION);
entityTrackingOnClose(parser, entity, __LINE__);
entity->open = XML_FALSE;
if (result)
goto endEntityValue;
result = processEntity(parser, entity, XML_FALSE, ENTITY_VALUE);
goto endEntityValue;
}
break;
}
@ -6375,6 +6568,81 @@ endEntityValue:
# ifdef XML_DTD
parser->m_prologState.inEntityValue = oldInEntityValue;
# endif /* XML_DTD */
// If 'nextPtr' is given, it should be updated during the processing
if (nextPtr != NULL) {
*nextPtr = next;
}
return result;
}
static enum XML_Error
callStoreEntityValue(XML_Parser parser, const ENCODING *enc,
const char *entityTextPtr, const char *entityTextEnd,
enum XML_Account account) {
const char *next = entityTextPtr;
enum XML_Error result = XML_ERROR_NONE;
while (1) {
if (! parser->m_openValueEntities) {
result
= storeEntityValue(parser, enc, next, entityTextEnd, account, &next);
} else {
OPEN_INTERNAL_ENTITY *const openEntity = parser->m_openValueEntities;
if (! openEntity)
return XML_ERROR_UNEXPECTED_STATE;
ENTITY *const entity = openEntity->entity;
const char *const textStart
= ((const char *)entity->textPtr) + entity->processed;
const char *const textEnd
= (const char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
const char *nextInEntity = textStart;
if (entity->hasMore) {
result = storeEntityValue(parser, parser->m_internalEncoding, textStart,
textEnd, XML_ACCOUNT_ENTITY_EXPANSION,
&nextInEntity);
if (result != XML_ERROR_NONE)
break;
// Check if entity is complete, if not, mark down how much of it is
// processed. A XML_SUSPENDED check here is not required as
// appendAttributeValue will never suspend the parser.
if (textEnd != nextInEntity) {
entity->processed
= (int)(nextInEntity - (const char *)entity->textPtr);
continue;
}
// Entity is complete. We cannot close it here since we need to first
// process its possible inner entities (which are added to the
// m_openValueEntities during storeEntityValue)
entity->hasMore = XML_FALSE;
continue;
} // End of entity processing, "if" block skips the rest
// Remove fully processed openEntity from open entity list.
# if XML_GE == 1
entityTrackingOnClose(parser, entity, __LINE__);
# endif
// openEntity is m_openValueEntities' head, since we set it at the
// start of this function and because we skipped storeEntityValue call
// with hasMore set to false. This means we can directly remove the head
// of m_openValueEntities
assert(parser->m_openValueEntities == openEntity);
entity->open = XML_FALSE;
parser->m_openValueEntities = parser->m_openValueEntities->next;
/* put openEntity back in list of free instances */
openEntity->next = parser->m_freeValueEntities;
parser->m_freeValueEntities = openEntity;
}
// Break if an error occurred or there is nothing left to process
if (result
|| (parser->m_openValueEntities == NULL && entityTextEnd == next)) {
break;
}
}
return result;
}
@ -7983,7 +8251,7 @@ entityTrackingReportStats(XML_Parser rootParser, ENTITY *entity,
(void *)rootParser, rootParser->m_entity_stats.countEverOpened,
rootParser->m_entity_stats.currentDepth,
rootParser->m_entity_stats.maximumDepthSeen,
(rootParser->m_entity_stats.currentDepth - 1) * 2, "",
((int)rootParser->m_entity_stats.currentDepth - 1) * 2, "",
entity->is_param ? "%" : "&", entityName, action, entity->textLen,
sourceLine);
}
@ -8542,11 +8810,13 @@ unsignedCharToPrintable(unsigned char c) {
return "\\xFE";
case 255:
return "\\xFF";
// LCOV_EXCL_START
default:
assert(0); /* never gets here */
return "dead code";
}
assert(0); /* never gets here */
// LCOV_EXCL_STOP
}
#endif /* XML_GE == 1 */

View file

@ -360,13 +360,16 @@ END_TEST
START_TEST(test_helper_unsigned_char_to_printable) {
// Smoke test
unsigned char uc = 0;
for (; uc < (unsigned char)-1; uc++) {
for (;; uc++) {
set_subtest("char %u", (unsigned)uc);
const char *const printable = unsignedCharToPrintable(uc);
if (printable == NULL)
fail("unsignedCharToPrintable returned NULL");
else if (strlen(printable) < (size_t)1)
fail("unsignedCharToPrintable returned empty string");
if (uc == (unsigned char)-1) {
break;
}
}
// Two concrete samples

View file

@ -19,6 +19,7 @@
Copyright (c) 2020 Tim Gates <tim.gates@iress.com>
Copyright (c) 2021 Donghee Na <donghee.na@python.org>
Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com>
Copyright (c) 2025 Berkay Eren Ürün <berkay.ueruen@siemens.com>
Licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining
@ -450,6 +451,31 @@ START_TEST(test_alloc_internal_entity) {
}
END_TEST
START_TEST(test_alloc_parameter_entity) {
const char *text = "<!DOCTYPE foo ["
"<!ENTITY % param1 \"<!ENTITY internal 'some_text'>\">"
"%param1;"
"]> <foo>&internal;content</foo>";
int i;
const int alloc_test_max_repeats = 30;
for (i = 0; i < alloc_test_max_repeats; i++) {
g_allocation_count = i;
XML_SetParamEntityParsing(g_parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
!= XML_STATUS_ERROR)
break;
alloc_teardown();
alloc_setup();
}
g_allocation_count = -1;
if (i == 0)
fail("Parameter entity processed despite duff allocator");
if (i == alloc_test_max_repeats)
fail("Parameter entity not processed at max allocation count");
}
END_TEST
/* Test the robustness against allocation failure of element handling
* Based on test_dtd_default_handling().
*/
@ -2079,6 +2105,7 @@ make_alloc_test_case(Suite *s) {
tcase_add_test__ifdef_xml_dtd(tc_alloc, test_alloc_external_entity);
tcase_add_test__ifdef_xml_dtd(tc_alloc, test_alloc_ext_entity_set_encoding);
tcase_add_test__ifdef_xml_dtd(tc_alloc, test_alloc_internal_entity);
tcase_add_test__ifdef_xml_dtd(tc_alloc, test_alloc_parameter_entity);
tcase_add_test__ifdef_xml_dtd(tc_alloc, test_alloc_dtd_default_handling);
tcase_add_test(tc_alloc, test_alloc_explicit_encoding);
tcase_add_test(tc_alloc, test_alloc_set_base);

View file

@ -10,7 +10,7 @@
Copyright (c) 2003 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2005-2007 Steven Solie <steven@solie.ca>
Copyright (c) 2005-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2022 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2017 Joe Orton <jorton@redhat.com>
Copyright (c) 2017 José Gutiérrez de la Concha <jose@zeroc.com>
@ -19,6 +19,7 @@
Copyright (c) 2020 Tim Gates <tim.gates@iress.com>
Copyright (c) 2021 Donghee Na <donghee.na@python.org>
Copyright (c) 2023-2024 Sony Corporation / Snild Dolkow <snild@sony.com>
Copyright (c) 2024-2025 Berkay Eren Ürün <berkay.ueruen@siemens.com>
Licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining
@ -1191,6 +1192,22 @@ START_TEST(test_not_standalone_handler_accept) {
}
END_TEST
START_TEST(test_entity_start_tag_level_greater_than_one) {
const char *const text = "<!DOCTYPE t1 [\n"
" <!ENTITY e1 'hello'>\n"
"]>\n"
"<t1>\n"
" <t2>&e1;</t2>\n"
"</t1>\n";
XML_Parser parser = XML_ParserCreate(NULL);
assert_true(_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text),
/*isFinal*/ XML_TRUE)
== XML_STATUS_OK);
XML_ParserFree(parser);
}
END_TEST
START_TEST(test_wfc_no_recursive_entity_refs) {
const char *text = "<!DOCTYPE doc [\n"
" <!ENTITY entity '&#38;entity;'>\n"
@ -1202,6 +1219,93 @@ START_TEST(test_wfc_no_recursive_entity_refs) {
}
END_TEST
START_TEST(test_no_indirectly_recursive_entity_refs) {
struct TestCase {
const char *doc;
bool usesParameterEntities;
};
const struct TestCase cases[] = {
// general entity + character data
{"<!DOCTYPE a [\n"
" <!ENTITY e1 '&e2;'>\n"
" <!ENTITY e2 '&e1;'>\n"
"]><a>&e2;</a>\n",
false},
// general entity + attribute value
{"<!DOCTYPE a [\n"
" <!ENTITY e1 '&e2;'>\n"
" <!ENTITY e2 '&e1;'>\n"
"]><a k1='&e2;' />\n",
false},
// parameter entity
{"<!DOCTYPE doc [\n"
" <!ENTITY % p1 '&#37;p2;'>\n"
" <!ENTITY % p2 '&#37;p1;'>\n"
" <!ENTITY % define_g \"<!ENTITY g '&#37;p2;'>\">\n"
" %define_g;\n"
"]>\n"
"<doc/>\n",
true},
};
const XML_Bool reset_or_not[] = {XML_TRUE, XML_FALSE};
for (size_t i = 0; i < sizeof(cases) / sizeof(cases[0]); i++) {
for (size_t j = 0; j < sizeof(reset_or_not) / sizeof(reset_or_not[0]);
j++) {
const XML_Bool reset_wanted = reset_or_not[j];
const char *const doc = cases[i].doc;
const bool usesParameterEntities = cases[i].usesParameterEntities;
set_subtest("[%i,reset=%i] %s", (int)i, (int)j, doc);
#ifdef XML_DTD // both GE and DTD
const bool rejection_expected = true;
#elif XML_GE == 1 // GE but not DTD
const bool rejection_expected = ! usesParameterEntities;
#else // neither DTD nor GE
const bool rejection_expected = false;
#endif
XML_Parser parser = XML_ParserCreate(NULL);
#ifdef XML_DTD
if (usesParameterEntities) {
assert_true(
XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS)
== 1);
}
#else
UNUSED_P(usesParameterEntities);
#endif // XML_DTD
const enum XML_Status status
= _XML_Parse_SINGLE_BYTES(parser, doc, (int)strlen(doc),
/*isFinal*/ XML_TRUE);
if (rejection_expected) {
assert_true(status == XML_STATUS_ERROR);
assert_true(XML_GetErrorCode(parser) == XML_ERROR_RECURSIVE_ENTITY_REF);
} else {
assert_true(status == XML_STATUS_OK);
}
if (reset_wanted) {
// This covers free'ing of (eventually) all three open entity lists by
// XML_ParserReset.
XML_ParserReset(parser, NULL);
}
// This covers free'ing of (eventually) all three open entity lists by
// XML_ParserFree (unless XML_ParserReset has already done that above).
XML_ParserFree(parser);
}
}
}
END_TEST
START_TEST(test_recursive_external_parameter_entity_2) {
struct TestCase {
const char *doc;
@ -1417,7 +1521,9 @@ START_TEST(test_suspend_parser_between_char_data_calls) {
XML_SetCharacterDataHandler(g_parser, clearing_aborting_character_handler);
g_resumable = XML_TRUE;
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
if (XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE)
!= XML_STATUS_SUSPENDED)
xml_failure(g_parser);
if (XML_GetErrorCode(g_parser) != XML_ERROR_NONE)
@ -1446,7 +1552,9 @@ START_TEST(test_repeated_stop_parser_between_char_data_calls) {
XML_SetCharacterDataHandler(g_parser, parser_stop_character_handler);
g_resumable = XML_TRUE;
g_abortable = XML_FALSE;
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
if (XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE)
!= XML_STATUS_SUSPENDED)
fail("Failed to double-suspend parser");
@ -1830,12 +1938,19 @@ END_TEST
/* Test suspending the parser in cdata handler */
START_TEST(test_suspend_parser_between_cdata_calls) {
if (g_chunkSize != 0) {
// this test does not use SINGLE_BYTES, because of suspension
return;
}
const char *text = long_cdata_text;
enum XML_Status result;
XML_SetCharacterDataHandler(g_parser, clearing_aborting_character_handler);
g_resumable = XML_TRUE;
result = _XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE);
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
result = XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE);
if (result != XML_STATUS_SUSPENDED) {
if (result == XML_STATUS_ERROR)
xml_failure(g_parser);
@ -2378,6 +2493,11 @@ END_TEST
* entity. Exercises some obscure code in XML_ParserReset().
*/
START_TEST(test_reset_in_entity) {
if (g_chunkSize != 0) {
// this test does not use SINGLE_BYTES, because of suspension
return;
}
const char *text = "<!DOCTYPE doc [\n"
"<!ENTITY wombat 'wom'>\n"
"<!ENTITY entity 'hi &wom; there'>\n"
@ -2387,7 +2507,9 @@ START_TEST(test_reset_in_entity) {
g_resumable = XML_TRUE;
XML_SetCharacterDataHandler(g_parser, clearing_aborting_character_handler);
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
if (XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR)
xml_failure(g_parser);
XML_GetParsingStatus(g_parser, &status);
@ -3634,7 +3756,9 @@ START_TEST(test_suspend_xdecl) {
XML_SetXmlDeclHandler(g_parser, entity_suspending_xdecl_handler);
XML_SetUserData(g_parser, g_parser);
g_resumable = XML_TRUE;
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
if (XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE)
!= XML_STATUS_SUSPENDED)
xml_failure(g_parser);
if (XML_GetErrorCode(g_parser) != XML_ERROR_NONE)
@ -3830,13 +3954,20 @@ END_TEST
/* Test syntax error is caught at parse resumption */
START_TEST(test_resume_entity_with_syntax_error) {
if (g_chunkSize != 0) {
// this test does not use SINGLE_BYTES, because of suspension
return;
}
const char *text = "<!DOCTYPE doc [\n"
"<!ENTITY foo '<suspend>Hi</wombat>'>\n"
"]>\n"
"<doc>&foo;</doc>\n";
XML_SetStartElementHandler(g_parser, start_element_suspender);
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_TRUE)
// can't use SINGLE_BYTES here, because it'll return early on suspension, and
// we won't know exactly how much input we actually managed to give Expat.
if (XML_Parse(g_parser, text, (int)strlen(text), XML_TRUE)
!= XML_STATUS_SUSPENDED)
xml_failure(g_parser);
if (XML_ResumeParser(g_parser) != XML_STATUS_ERROR)
@ -3960,7 +4091,7 @@ START_TEST(test_skipped_null_loaded_ext_entity) {
= {"<!ENTITY % pe1 SYSTEM 'http://example.org/two.ent'>\n"
"<!ENTITY % pe2 '%pe1;'>\n"
"%pe2;\n",
external_entity_null_loader};
external_entity_null_loader, NULL};
XML_SetUserData(g_parser, &test_data);
XML_SetParamEntityParsing(g_parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
@ -3978,7 +4109,7 @@ START_TEST(test_skipped_unloaded_ext_entity) {
= {"<!ENTITY % pe1 SYSTEM 'http://example.org/two.ent'>\n"
"<!ENTITY % pe2 '%pe1;'>\n"
"%pe2;\n",
NULL};
NULL, NULL};
XML_SetUserData(g_parser, &test_data);
XML_SetParamEntityParsing(g_parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
@ -5278,6 +5409,151 @@ START_TEST(test_pool_integrity_with_unfinished_attr) {
}
END_TEST
/* Test a possible early return location in internalEntityProcessor */
START_TEST(test_entity_ref_no_elements) {
const char *const text = "<!DOCTYPE foo [\n"
"<!ENTITY e1 \"test\">\n"
"]> <foo>&e1;"; // intentionally missing newline
XML_Parser parser = XML_ParserCreate(NULL);
assert_true(_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR);
assert_true(XML_GetErrorCode(parser) == XML_ERROR_NO_ELEMENTS);
XML_ParserFree(parser);
}
END_TEST
/* Tests if chained entity references lead to unbounded recursion */
START_TEST(test_deep_nested_entity) {
const size_t N_LINES = 60000;
const size_t SIZE_PER_LINE = 50;
char *const text = (char *)malloc((N_LINES + 4) * SIZE_PER_LINE);
if (text == NULL) {
fail("malloc failed");
}
char *textPtr = text;
// Create the XML
textPtr += snprintf(textPtr, SIZE_PER_LINE,
"<!DOCTYPE foo [\n"
" <!ENTITY s0 'deepText'>\n");
for (size_t i = 1; i < N_LINES; ++i) {
textPtr += snprintf(textPtr, SIZE_PER_LINE, " <!ENTITY s%lu '&s%lu;'>\n",
(long unsigned)i, (long unsigned)(i - 1));
}
snprintf(textPtr, SIZE_PER_LINE, "]> <foo>&s%lu;</foo>\n",
(long unsigned)(N_LINES - 1));
const XML_Char *const expected = XCS("deepText");
CharData storage;
CharData_Init(&storage);
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetCharacterDataHandler(parser, accumulate_characters);
XML_SetUserData(parser, &storage);
if (_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR)
xml_failure(parser);
CharData_CheckXMLChars(&storage, expected);
XML_ParserFree(parser);
free(text);
}
END_TEST
/* Tests if chained entity references in attributes
lead to unbounded recursion */
START_TEST(test_deep_nested_attribute_entity) {
const size_t N_LINES = 60000;
const size_t SIZE_PER_LINE = 100;
char *const text = (char *)malloc((N_LINES + 4) * SIZE_PER_LINE);
if (text == NULL) {
fail("malloc failed");
}
char *textPtr = text;
// Create the XML
textPtr += snprintf(textPtr, SIZE_PER_LINE,
"<!DOCTYPE foo [\n"
" <!ENTITY s0 'deepText'>\n");
for (size_t i = 1; i < N_LINES; ++i) {
textPtr += snprintf(textPtr, SIZE_PER_LINE, " <!ENTITY s%lu '&s%lu;'>\n",
(long unsigned)i, (long unsigned)(i - 1));
}
snprintf(textPtr, SIZE_PER_LINE, "]> <foo name='&s%lu;'>mainText</foo>\n",
(long unsigned)(N_LINES - 1));
AttrInfo doc_info[] = {{XCS("name"), XCS("deepText")}, {NULL, NULL}};
ElementInfo info[] = {{XCS("foo"), 1, NULL, NULL}, {NULL, 0, NULL, NULL}};
info[0].attributes = doc_info;
XML_Parser parser = XML_ParserCreate(NULL);
ParserAndElementInfo parserPlusElemenInfo = {parser, info};
XML_SetStartElementHandler(parser, counting_start_element_handler);
XML_SetUserData(parser, &parserPlusElemenInfo);
if (_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR)
xml_failure(parser);
XML_ParserFree(parser);
free(text);
}
END_TEST
START_TEST(test_deep_nested_entity_delayed_interpretation) {
const size_t N_LINES = 70000;
const size_t SIZE_PER_LINE = 100;
char *const text = (char *)malloc((N_LINES + 4) * SIZE_PER_LINE);
if (text == NULL) {
fail("malloc failed");
}
char *textPtr = text;
// Create the XML
textPtr += snprintf(textPtr, SIZE_PER_LINE,
"<!DOCTYPE foo [\n"
" <!ENTITY %% s0 'deepText'>\n");
for (size_t i = 1; i < N_LINES; ++i) {
textPtr += snprintf(textPtr, SIZE_PER_LINE,
" <!ENTITY %% s%lu '&#37;s%lu;'>\n", (long unsigned)i,
(long unsigned)(i - 1));
}
snprintf(textPtr, SIZE_PER_LINE,
" <!ENTITY %% define_g \"<!ENTITY g '&#37;s%lu;'>\">\n"
" %%define_g;\n"
"]>\n"
"<foo/>\n",
(long unsigned)(N_LINES - 1));
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
if (_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR)
xml_failure(parser);
XML_ParserFree(parser);
free(text);
}
END_TEST
START_TEST(test_nested_entity_suspend) {
const char *const text = "<!DOCTYPE a [\n"
" <!ENTITY e1 '<!--e1-->'>\n"
@ -5308,6 +5584,35 @@ START_TEST(test_nested_entity_suspend) {
}
END_TEST
START_TEST(test_nested_entity_suspend_2) {
const char *const text = "<!DOCTYPE doc [\n"
" <!ENTITY ge1 'head1Ztail1'>\n"
" <!ENTITY ge2 'head2&ge1;tail2'>\n"
" <!ENTITY ge3 'head3&ge2;tail3'>\n"
"]>\n"
"<doc>&ge3;</doc>";
const XML_Char *const expected = XCS("head3") XCS("head2") XCS("head1")
XCS("Z") XCS("tail1") XCS("tail2") XCS("tail3");
CharData storage;
CharData_Init(&storage);
XML_Parser parser = XML_ParserCreate(NULL);
ParserPlusStorage parserPlusStorage = {parser, &storage};
XML_SetCharacterDataHandler(parser, accumulate_char_data_and_suspend);
XML_SetUserData(parser, &parserPlusStorage);
enum XML_Status status = XML_Parse(parser, text, (int)strlen(text), XML_TRUE);
while (status == XML_STATUS_SUSPENDED) {
status = XML_ResumeParser(parser);
}
if (status != XML_STATUS_OK)
xml_failure(parser);
CharData_CheckXMLChars(&storage, expected);
XML_ParserFree(parser);
}
END_TEST
/* Regression test for quadratic parsing on large tokens */
START_TEST(test_big_tokens_scale_linearly) {
const struct {
@ -5968,7 +6273,9 @@ make_basic_test_case(Suite *s) {
tcase_add_test(tc_basic, test_wfc_undeclared_entity_with_external_subset);
tcase_add_test(tc_basic, test_not_standalone_handler_reject);
tcase_add_test(tc_basic, test_not_standalone_handler_accept);
tcase_add_test(tc_basic, test_entity_start_tag_level_greater_than_one);
tcase_add_test__if_xml_ge(tc_basic, test_wfc_no_recursive_entity_refs);
tcase_add_test(tc_basic, test_no_indirectly_recursive_entity_refs);
tcase_add_test__ifdef_xml_dtd(tc_basic, test_ext_entity_invalid_parse);
tcase_add_test__if_xml_ge(tc_basic, test_dtd_default_handling);
tcase_add_test(tc_basic, test_dtd_attr_handling);
@ -6147,7 +6454,13 @@ make_basic_test_case(Suite *s) {
tcase_add_test(tc_basic, test_empty_element_abort);
tcase_add_test__ifdef_xml_dtd(tc_basic,
test_pool_integrity_with_unfinished_attr);
tcase_add_test__if_xml_ge(tc_basic, test_entity_ref_no_elements);
tcase_add_test__if_xml_ge(tc_basic, test_deep_nested_entity);
tcase_add_test__if_xml_ge(tc_basic, test_deep_nested_attribute_entity);
tcase_add_test__if_xml_ge(tc_basic,
test_deep_nested_entity_delayed_interpretation);
tcase_add_test__if_xml_ge(tc_basic, test_nested_entity_suspend);
tcase_add_test__if_xml_ge(tc_basic, test_nested_entity_suspend_2);
tcase_add_test(tc_basic, test_big_tokens_scale_linearly);
tcase_add_test(tc_basic, test_set_reparse_deferral);
tcase_add_test(tc_basic, test_reparse_deferral_is_inherited);

View file

@ -8,7 +8,7 @@
Copyright (c) 2003-2006 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2005-2007 Steven Solie <steven@solie.ca>
Copyright (c) 2017-2023 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017 Rhodri James <rhodri@wildebeest.org.uk>
Licensed under the MIT license:
@ -32,10 +32,18 @@
USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#define _POSIX_C_SOURCE 1 // fdopen
#if defined(_MSC_VER)
# include <io.h> // _open, _close
#else
# include <unistd.h> // close
#endif
#include <fcntl.h> // open
#include <sys/stat.h>
#include <assert.h>
#include <stddef.h> // ptrdiff_t
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include "expat.h"
@ -52,17 +60,18 @@
# define XML_FMT_STR "s"
#endif
static void
static int
usage(const char *prog, int rc) {
fprintf(stderr, "usage: %s [-n] filename bufferSize nr_of_loops\n", prog);
exit(rc);
return rc;
}
int
main(int argc, char *argv[]) {
XML_Parser parser;
char *XMLBuf, *XMLBufEnd, *XMLBufPtr;
FILE *fd;
int fd;
FILE *file;
struct stat fileAttr;
int nrOfLoops, bufferSize, i, isFinal;
size_t fileSize;
@ -76,34 +85,48 @@ main(int argc, char *argv[]) {
ns = 1;
j = 1;
} else
usage(argv[0], 1);
return usage(argv[0], 1);
}
}
if (argc != j + 4)
usage(argv[0], 1);
return usage(argv[0], 1);
if (stat(argv[j + 1], &fileAttr) != 0) {
fprintf(stderr, "could not access file '%s'\n", argv[j + 1]);
fd = open(argv[j + 1], O_RDONLY);
if (fd == -1) {
fprintf(stderr, "could not open file '%s'\n", argv[j + 1]);
return 2;
}
fd = fopen(argv[j + 1], "r");
if (! fd) {
fprintf(stderr, "could not open file '%s'\n", argv[j + 1]);
exit(2);
if (fstat(fd, &fileAttr) != 0) {
close(fd);
fprintf(stderr, "could not fstat file '%s'\n", argv[j + 1]);
return 2;
}
file = fdopen(fd, "r");
if (! file) {
close(fd);
fprintf(stderr, "could not fdopen file '%s'\n", argv[j + 1]);
return 2;
}
bufferSize = atoi(argv[j + 2]);
nrOfLoops = atoi(argv[j + 3]);
if (bufferSize <= 0 || nrOfLoops <= 0) {
fclose(file); // NOTE: this closes fd as well
fprintf(stderr, "buffer size and nr of loops must be greater than zero.\n");
exit(3);
return 3;
}
XMLBuf = malloc(fileAttr.st_size);
fileSize = fread(XMLBuf, sizeof(char), fileAttr.st_size, fd);
fclose(fd);
if (XMLBuf == NULL) {
fclose(file); // NOTE: this closes fd as well
fprintf(stderr, "ouf of memory.\n");
return 5;
}
fileSize = fread(XMLBuf, sizeof(char), fileAttr.st_size, file);
fclose(file); // NOTE: this closes fd as well
if (ns)
parser = XML_ParserCreateNS(NULL, '!');
@ -132,7 +155,7 @@ main(int argc, char *argv[]) {
XML_GetCurrentColumnNumber(parser));
free(XMLBuf);
XML_ParserFree(parser);
exit(4);
return 4;
}
XMLBufPtr += bufferSize;
} while (! isFinal);

View file

@ -10,7 +10,7 @@
Copyright (c) 2003 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2005-2007 Steven Solie <steven@solie.ca>
Copyright (c) 2005-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2022 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2017 Joe Orton <jorton@redhat.com>
Copyright (c) 2017 José Gutiérrez de la Concha <jose@zeroc.com>
@ -42,6 +42,8 @@
*/
#include <assert.h>
#include <errno.h>
#include <stdint.h> // for SIZE_MAX
#include <stdio.h>
#include <string.h>
@ -202,6 +204,12 @@ _XML_Parse_SINGLE_BYTES(XML_Parser parser, const char *s, int len,
for (; len > chunksize; len -= chunksize, s += chunksize) {
enum XML_Status res = XML_Parse(parser, s, chunksize, XML_FALSE);
if (res != XML_STATUS_OK) {
if ((res == XML_STATUS_SUSPENDED) && (len > chunksize)) {
fail("Use of function _XML_Parse_SINGLE_BYTES with a chunk size "
"greater than 0 (from g_chunkSize) does not work well with "
"suspension. Please consider use of plain XML_Parse at this "
"place in your test, instead.");
}
return res;
}
}
@ -294,3 +302,26 @@ duff_reallocator(void *ptr, size_t size) {
g_reallocation_count--;
return realloc(ptr, size);
}
// Portable remake of strndup(3) for C99; does not care about space efficiency
char *
portable_strndup(const char *s, size_t n) {
if ((s == NULL) || (n == SIZE_MAX)) {
errno = EINVAL;
return NULL;
}
char *const buffer = (char *)malloc(n + 1);
if (buffer == NULL) {
errno = ENOMEM;
return NULL;
}
errno = 0;
memcpy(buffer, s, n);
buffer[n] = '\0';
return buffer;
}

View file

@ -10,7 +10,7 @@
Copyright (c) 2003 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2005-2007 Steven Solie <steven@solie.ca>
Copyright (c) 2005-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2022 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2017 Joe Orton <jorton@redhat.com>
Copyright (c) 2017 José Gutiérrez de la Concha <jose@zeroc.com>
@ -146,6 +146,8 @@ extern void *duff_allocator(size_t size);
extern void *duff_reallocator(void *ptr, size_t size);
extern char *portable_strndup(const char *s, size_t n);
#endif /* XML_COMMON_H */
#ifdef __cplusplus

View file

@ -1842,6 +1842,15 @@ element_decl_suspender(void *userData, const XML_Char *name,
XML_FreeContentModel(g_parser, model);
}
void XMLCALL
suspend_after_element_declaration(void *userData, const XML_Char *name,
XML_Content *model) {
UNUSED_P(name);
XML_Parser parser = (XML_Parser)userData;
assert_true(XML_StopParser(parser, /*resumable*/ XML_TRUE) == XML_STATUS_OK);
XML_FreeContentModel(parser, model);
}
void XMLCALL
accumulate_pi_characters(void *userData, const XML_Char *target,
const XML_Char *data) {
@ -1882,6 +1891,20 @@ accumulate_entity_decl(void *userData, const XML_Char *entityName,
CharData_AppendXMLChars(storage, XCS("\n"), 1);
}
void XMLCALL
accumulate_char_data_and_suspend(void *userData, const XML_Char *s, int len) {
ParserPlusStorage *const parserPlusStorage = (ParserPlusStorage *)userData;
CharData_AppendXMLChars(parserPlusStorage->storage, s, len);
for (int i = 0; i < len; i++) {
if (s[i] == 'Z') {
XML_StopParser(parserPlusStorage->parser, /*resumable=*/XML_TRUE);
break;
}
}
}
void XMLCALL
accumulate_start_element(void *userData, const XML_Char *name,
const XML_Char **atts) {

View file

@ -325,6 +325,7 @@ extern int XMLCALL external_entity_devaluer(XML_Parser parser,
typedef struct ext_hdlr_data {
const char *parse_text;
XML_ExternalEntityRefHandler handler;
CharData *storage;
} ExtHdlrData;
extern int XMLCALL external_entity_oneshot_loader(XML_Parser parser,
@ -557,6 +558,10 @@ extern void XMLCALL suspending_comment_handler(void *userData,
extern void XMLCALL element_decl_suspender(void *userData, const XML_Char *name,
XML_Content *model);
extern void XMLCALL suspend_after_element_declaration(void *userData,
const XML_Char *name,
XML_Content *model);
extern void XMLCALL accumulate_pi_characters(void *userData,
const XML_Char *target,
const XML_Char *data);
@ -569,6 +574,10 @@ extern void XMLCALL accumulate_entity_decl(
const XML_Char *systemId, const XML_Char *publicId,
const XML_Char *notationName);
extern void XMLCALL accumulate_char_data_and_suspend(void *userData,
const XML_Char *s,
int len);
extern void XMLCALL accumulate_start_element(void *userData,
const XML_Char *name,
const XML_Char **atts);

View file

@ -14,7 +14,7 @@
Copyright (c) 2004-2006 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
Copyright (c) 2006-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2022 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2023-2024 Sony Corporation / Snild Dolkow <snild@sony.com>
Licensed under the MIT license:
@ -129,8 +129,10 @@ void _check_set_test_info(char const *function, char const *filename,
* Prototypes for the actual implementation.
*/
# if defined(__GNUC__)
# if defined(__has_attribute)
# if __has_attribute(noreturn)
__attribute__((noreturn))
# endif
# endif
void
_fail(const char *file, int line, const char *msg);

View file

@ -10,7 +10,7 @@
Copyright (c) 2003 Greg Stein <gstein@users.sourceforge.net>
Copyright (c) 2005-2007 Steven Solie <steven@solie.ca>
Copyright (c) 2005-2012 Karl Waclawek <karl@waclawek.net>
Copyright (c) 2016-2024 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2016-2025 Sebastian Pipping <sebastian@pipping.org>
Copyright (c) 2017-2022 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2017 Joe Orton <jorton@redhat.com>
Copyright (c) 2017 José Gutiérrez de la Concha <jose@zeroc.com>
@ -59,6 +59,9 @@
#include "handlers.h"
#include "misc_tests.h"
void XMLCALL accumulate_characters_ext_handler(void *userData,
const XML_Char *s, int len);
/* Test that a failure to allocate the parser structure fails gracefully */
START_TEST(test_misc_alloc_create_parser) {
XML_Memory_Handling_Suite memsuite = {duff_allocator, realloc, free};
@ -208,7 +211,7 @@ START_TEST(test_misc_version) {
if (! versions_equal(&read_version, &parsed_version))
fail("Version mismatch");
if (xcstrcmp(version_text, XCS("expat_2.6.4"))) /* needs bump on releases */
if (xcstrcmp(version_text, XCS("expat_2.7.1"))) /* needs bump on releases */
fail("XML_*_VERSION in expat.h out of sync?\n");
}
END_TEST
@ -294,6 +297,7 @@ START_TEST(test_misc_stop_during_end_handler_issue_240_1) {
parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start_element_issue_240, end_element_issue_240);
mydata = (DataIssue240 *)malloc(sizeof(DataIssue240));
assert_true(mydata != NULL);
mydata->parser = parser;
mydata->deep = 0;
XML_SetUserData(parser, mydata);
@ -315,6 +319,7 @@ START_TEST(test_misc_stop_during_end_handler_issue_240_2) {
parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start_element_issue_240, end_element_issue_240);
mydata = (DataIssue240 *)malloc(sizeof(DataIssue240));
assert_true(mydata != NULL);
mydata->parser = parser;
mydata->deep = 0;
XML_SetUserData(parser, mydata);
@ -328,64 +333,119 @@ START_TEST(test_misc_stop_during_end_handler_issue_240_2) {
END_TEST
START_TEST(test_misc_deny_internal_entity_closing_doctype_issue_317) {
const char *const inputOne = "<!DOCTYPE d [\n"
"<!ENTITY % e ']><d/>'>\n"
"\n"
"%e;";
const char *const inputOne
= "<!DOCTYPE d [\n"
"<!ENTITY % element_d '<!ELEMENT d (#PCDATA)*>'>\n"
"%element_d;\n"
"<!ENTITY % e ']><d/>'>\n"
"\n"
"%e;";
const char *const inputTwo
= "<!DOCTYPE d [\n"
"<!ENTITY % element_d '<!ELEMENT d (#PCDATA)*>'>\n"
"%element_d;\n"
"<!ENTITY % e1 ']><d/>'><!ENTITY % e2 '&#37;e1;'>\n"
"\n"
"%e2;";
const char *const inputThree = "<!DOCTYPE d [\n"
"<!ENTITY % e ']><d'>\n"
"\n"
"%e;/>";
const char *const inputIssue317 = "<!DOCTYPE doc [\n"
"<!ENTITY % foo ']>\n"
"<doc>Hell<oc (#PCDATA)*>'>\n"
"%foo;\n"
"]>\n"
"<doc>Hello, world</dVc>";
const char *const inputThree
= "<!DOCTYPE d [\n"
"<!ENTITY % element_d '<!ELEMENT d (#PCDATA)*>'>\n"
"%element_d;\n"
"<!ENTITY % e ']><d'>\n"
"\n"
"%e;/>";
const char *const inputIssue317
= "<!DOCTYPE doc [\n"
"<!ENTITY % element_doc '<!ELEMENT doc (#PCDATA)*>'>\n"
"%element_doc;\n"
"<!ENTITY % foo ']>\n"
"<doc>Hell<oc (#PCDATA)*>'>\n"
"%foo;\n"
"]>\n"
"<doc>Hello, world</dVc>";
const char *const inputs[] = {inputOne, inputTwo, inputThree, inputIssue317};
const XML_Bool suspendOrNot[] = {XML_FALSE, XML_TRUE};
size_t inputIndex = 0;
for (; inputIndex < sizeof(inputs) / sizeof(inputs[0]); inputIndex++) {
set_subtest("%s", inputs[inputIndex]);
XML_Parser parser;
enum XML_Status parseResult;
int setParamEntityResult;
XML_Size lineNumber;
XML_Size columnNumber;
const char *const input = inputs[inputIndex];
for (size_t suspendOrNotIndex = 0;
suspendOrNotIndex < sizeof(suspendOrNot) / sizeof(suspendOrNot[0]);
suspendOrNotIndex++) {
const char *const input = inputs[inputIndex];
const XML_Bool suspend = suspendOrNot[suspendOrNotIndex];
if (suspend && (g_chunkSize > 0)) {
// We cannot use _XML_Parse_SINGLE_BYTES below due to suspension, and
// so chunk sizes >0 would only repeat the very same test
// due to use of plain XML_Parse; we are saving upon that runtime:
return;
}
parser = XML_ParserCreate(NULL);
setParamEntityResult
= XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
if (setParamEntityResult != 1)
fail("Failed to set XML_PARAM_ENTITY_PARSING_ALWAYS.");
set_subtest("[input=%d suspend=%s] %s", (int)inputIndex,
suspend ? "true" : "false", input);
XML_Parser parser;
enum XML_Status parseResult;
int setParamEntityResult;
XML_Size lineNumber;
XML_Size columnNumber;
parser = XML_ParserCreate(NULL);
setParamEntityResult
= XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
if (setParamEntityResult != 1)
fail("Failed to set XML_PARAM_ENTITY_PARSING_ALWAYS.");
if (suspend) {
XML_SetUserData(parser, parser);
XML_SetElementDeclHandler(parser, suspend_after_element_declaration);
}
if (suspend) {
// can't use SINGLE_BYTES here, because it'll return early on
// suspension, and we won't know exactly how much input we actually
// managed to give Expat.
parseResult = XML_Parse(parser, input, (int)strlen(input), 0);
while (parseResult == XML_STATUS_SUSPENDED) {
parseResult = XML_ResumeParser(parser);
}
if (parseResult != XML_STATUS_ERROR) {
// can't use SINGLE_BYTES here, because it'll return early on
// suspension, and we won't know exactly how much input we actually
// managed to give Expat.
parseResult = XML_Parse(parser, "", 0, 1);
}
while (parseResult == XML_STATUS_SUSPENDED) {
parseResult = XML_ResumeParser(parser);
}
} else {
parseResult
= _XML_Parse_SINGLE_BYTES(parser, input, (int)strlen(input), 0);
if (parseResult != XML_STATUS_ERROR) {
parseResult = _XML_Parse_SINGLE_BYTES(parser, "", 0, 1);
}
}
parseResult = _XML_Parse_SINGLE_BYTES(parser, input, (int)strlen(input), 0);
if (parseResult != XML_STATUS_ERROR) {
parseResult = _XML_Parse_SINGLE_BYTES(parser, "", 0, 1);
if (parseResult != XML_STATUS_ERROR) {
fail("Parsing was expected to fail but succeeded.");
}
if (XML_GetErrorCode(parser) != XML_ERROR_INVALID_TOKEN)
fail("Error code does not match XML_ERROR_INVALID_TOKEN");
lineNumber = XML_GetCurrentLineNumber(parser);
if (lineNumber != 6)
fail("XML_GetCurrentLineNumber does not work as expected.");
columnNumber = XML_GetCurrentColumnNumber(parser);
if (columnNumber != 0)
fail("XML_GetCurrentColumnNumber does not work as expected.");
XML_ParserFree(parser);
}
if (XML_GetErrorCode(parser) != XML_ERROR_INVALID_TOKEN)
fail("Error code does not match XML_ERROR_INVALID_TOKEN");
lineNumber = XML_GetCurrentLineNumber(parser);
if (lineNumber != 4)
fail("XML_GetCurrentLineNumber does not work as expected.");
columnNumber = XML_GetCurrentColumnNumber(parser);
if (columnNumber != 0)
fail("XML_GetCurrentColumnNumber does not work as expected.");
XML_ParserFree(parser);
}
}
END_TEST
@ -519,6 +579,105 @@ START_TEST(test_misc_stopparser_rejects_unstarted_parser) {
}
END_TEST
/* Adaptation of accumulate_characters that takes ExtHdlrData input to work with
* test_renter_loop_finite_content below */
void XMLCALL
accumulate_characters_ext_handler(void *userData, const XML_Char *s, int len) {
ExtHdlrData *const test_data = (ExtHdlrData *)userData;
CharData_AppendXMLChars(test_data->storage, s, len);
}
/* Test that internalEntityProcessor does not re-enter forever;
* based on files tests/xmlconf/xmltest/valid/ext-sa/012.{xml,ent} */
START_TEST(test_renter_loop_finite_content) {
CharData storage;
CharData_Init(&storage);
const char *const text = "<!DOCTYPE doc [\n"
"<!ENTITY e1 '&e2;'>\n"
"<!ENTITY e2 '&e3;'>\n"
"<!ENTITY e3 SYSTEM '012.ent'>\n"
"<!ENTITY e4 '&e5;'>\n"
"<!ENTITY e5 '(e5)'>\n"
"<!ELEMENT doc (#PCDATA)>\n"
"]>\n"
"<doc>&e1;</doc>\n";
ExtHdlrData test_data = {"&e4;\n", external_entity_null_loader, &storage};
const XML_Char *const expected = XCS("(e5)\n");
XML_Parser parser = XML_ParserCreate(NULL);
assert_true(parser != NULL);
XML_SetUserData(parser, &test_data);
XML_SetExternalEntityRefHandler(parser, external_entity_oneshot_loader);
XML_SetCharacterDataHandler(parser, accumulate_characters_ext_handler);
if (_XML_Parse_SINGLE_BYTES(parser, text, (int)strlen(text), XML_TRUE)
== XML_STATUS_ERROR)
xml_failure(parser);
CharData_CheckXMLChars(&storage, expected);
XML_ParserFree(parser);
}
END_TEST
// Inspired by function XML_OriginalString of Perl's XML::Parser
static char *
dup_original_string(XML_Parser parser) {
const int byte_count = XML_GetCurrentByteCount(parser);
assert_true(byte_count >= 0);
int offset = -1;
int size = -1;
const char *const context = XML_GetInputContext(parser, &offset, &size);
#if XML_CONTEXT_BYTES > 0
assert_true(context != NULL);
assert_true(offset >= 0);
assert_true(size >= 0);
return portable_strndup(context + offset, byte_count);
#else
assert_true(context == NULL);
return NULL;
#endif
}
static void
on_characters_issue_980(void *userData, const XML_Char *s, int len) {
(void)s;
(void)len;
XML_Parser parser = (XML_Parser)userData;
char *const original_string = dup_original_string(parser);
#if XML_CONTEXT_BYTES > 0
assert_true(original_string != NULL);
assert_true(strcmp(original_string, "&draft.day;") == 0);
free(original_string);
#else
assert_true(original_string == NULL);
#endif
}
START_TEST(test_misc_expected_event_ptr_issue_980) {
// NOTE: This is a tiny subset of sample "REC-xml-19980210.xml"
// from Perl's XML::Parser
const char *const doc = "<!DOCTYPE day [\n"
" <!ENTITY draft.day '10'>\n"
"]>\n"
"<day>&draft.day;</day>\n";
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetUserData(parser, parser);
XML_SetCharacterDataHandler(parser, on_characters_issue_980);
assert_true(_XML_Parse_SINGLE_BYTES(parser, doc, (int)strlen(doc),
/*isFinal=*/XML_TRUE)
== XML_STATUS_OK);
XML_ParserFree(parser);
}
END_TEST
void
make_miscellaneous_test_case(Suite *s) {
TCase *tc_misc = tcase_create("miscellaneous tests");
@ -545,4 +704,6 @@ make_miscellaneous_test_case(Suite *s) {
tcase_add_test(tc_misc, test_misc_char_handler_stop_without_leak);
tcase_add_test(tc_misc, test_misc_resumeparser_not_crashing);
tcase_add_test(tc_misc, test_misc_stopparser_rejects_unstarted_parser);
tcase_add_test__if_xml_ge(tc_misc, test_renter_loop_finite_content);
tcase_add_test(tc_misc, test_misc_expected_event_ptr_issue_980);
}

View file

@ -2,8 +2,8 @@
# EXPAT TEST SCRIPT FOR W3C XML TEST SUITE
#
# This script can be used to exercise Expat against the
# w3c.org xml test suite, available from
# http://www.w3.org/XML/Test/xmlts20020606.zip.
# w3c.org xml test suite, available from:
# https://www.w3.org/XML/Test/xmlts20020606.zip
#
# To run this script, first set XMLWF below so that xmlwf can be
# found, then set the output directory with OUTPUT.
@ -30,6 +30,7 @@
# Copyright (c) 2002 Karl Waclawek <karl@waclawek.net>
# Copyright (c) 2008-2019 Sebastian Pipping <sebastian@pipping.org>
# Copyright (c) 2017 Rhodri James <rhodri@wildebeest.org.uk>
# Copyright (c) 2025 Hanno Böck <hanno@gentoo.org>
# Licensed under the MIT license:
#
# Permission is hereby granted, free of charge, to any person obtaining

View file

@ -14,6 +14,7 @@
Copyright (c) 2017 Rhodri James <rhodri@wildebeest.org.uk>
Copyright (c) 2017 Franek Korta <fkorta@gmail.com>
Copyright (c) 2022 Sean McBride <sean@rogue-research.com>
Copyright (c) 2025 Hanno Böck <hanno@gentoo.org>
Licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining
@ -55,7 +56,7 @@
# define EXPAT_read_count_t int
# define EXPAT_read_req_t unsigned int
#else /* POSIX */
/* http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html */
/* https://pubs.opengroup.org/onlinepubs/009695399/functions/read.html */
# define EXPAT_read read
# define EXPAT_read_count_t ssize_t
# define EXPAT_read_req_t size_t