cherrypy._cpreqbody module¶
Request body processing for CherryPy.
New in version 3.2.
Application authors have complete control over the parsing of HTTP request
entities. In short,
cherrypy.request.body
is now always set to an instance of
RequestBody
,
and that class is a subclass of Entity
.
When an HTTP request includes an entity body, it is often desirable to provide that information to applications in a form other than the raw bytes. Different content types demand different approaches. Examples:
For a GIF file, we want the raw bytes in a stream.
An HTML form is better parsed into its component fields, and each text field decoded from bytes to unicode.
A JSON body should be deserialized into a Python dict or list.
When the request contains a Content-Type header, the media type is used as a
key to look up a value in the
request.body.processors
dict.
If the full media
type is not found, then the major type is tried; for example, if no processor
is found for the ‘image/jpeg’ type, then we look for a processor for the
‘image’ types altogether. If neither the full type nor the major type has a
matching processor, then a default processor is used
(default_proc
). For most
types, this means no processing is done, and the body is left unread as a
raw byte stream. Processors are configurable in an ‘on_start_resource’ hook.
Some processors, especially those for the ‘text’ types, attempt to decode bytes
to unicode. If the Content-Type request header includes a ‘charset’ parameter,
this is used to decode the entity. Otherwise, one or more default charsets may
be attempted, although this decision is up to each processor. If a processor
successfully decodes an Entity or Part, it should set the
charset
attribute
on the Entity or Part to the name of the successful charset, so that
applications can easily re-encode or transcode the value if they wish.
If the Content-Type of the request entity is of major type ‘multipart’, then the above parsing process, and possibly a decoding process, is performed for each part.
For both the full entity and multipart parts, a Content-Disposition header may
be used to fill name
and
filename
attributes on the
request.body or the Part.
Custom Processors¶
You can add your own processors for any specific or major MIME type. Simply add
it to the processors
dict in a
hook/tool that runs at on_start_resource
or before_request_body
.
Here’s the built-in JSON tool for an example:
def json_in(force=True, debug=False):
request = cherrypy.serving.request
def json_processor(entity):
'''Read application/json data into request.json.'''
if not entity.headers.get("Content-Length", ""):
raise cherrypy.HTTPError(411)
body = entity.fp.read()
try:
request.json = json_decode(body)
except ValueError:
raise cherrypy.HTTPError(400, 'Invalid JSON document')
if force:
request.body.processors.clear()
request.body.default_proc = cherrypy.HTTPError(
415, 'Expected an application/json content type')
request.body.processors['application/json'] = json_processor
We begin by defining a new json_processor
function to stick in the
processors
dictionary. All processor functions take a single argument,
the Entity
instance they are to process. It will be called whenever a
request is received (for those URI’s where the tool is turned on) which
has a Content-Type
of “application/json”.
First, it checks for a valid Content-Length
(raising 411 if not valid),
then reads the remaining bytes on the socket. The fp
object knows its
own length, so it won’t hang waiting for data that never arrives. It will
return when all data has been read. Then, we decode those bytes using
Python’s built-in json
module, and stick the decoded result onto
request.json
. If it cannot be decoded, we raise 400.
If the “force” argument is True (the default), the Tool
clears the
processors
dict so that request entities of other Content-Types
aren’t parsed at all. Since there’s no entry for those invalid MIME
types, the default_proc
method of cherrypy.request.body
is
called. But this does nothing by default (usually to provide the page
handler an opportunity to handle it.)
But in our case, we want to raise 415, so we replace
request.body.default_proc
with the error (HTTPError
instances, when called, raise themselves).
If we were defining a custom processor, we can do so without making a Tool
.
Just add the config entry:
request.body.processors = {'application/json': json_processor}
Note that you can only replace the processors
dict wholesale this way,
not update the existing one.
- class cherrypy._cpreqbody.Entity(fp, headers, params=None, parts=None)[source]¶
Bases:
object
An HTTP request body, or MIME multipart body.
This class collects information about the HTTP request entity. When a given entity is of MIME type “multipart”, each part is parsed into its own Entity instance, and the set of parts stored in
entity.parts
.Between the
before_request_body
andbefore_handler
tools, CherryPy tries to process the request body (if any) by callingrequest.body.process
. This uses thecontent_type
of the Entity to look up a suitable processor inEntity.processors
, a dict. If a matching processor cannot be found for the complete Content-Type, it tries again using the major type. For example, if a request with an entity of type “image/jpeg” arrives, but no processor can be found for that complete type, then one is sought for the major type “image”. If a processor is still not found, then thedefault_proc
method of the Entity is called (which does nothing by default; you can override this too).CherryPy includes processors for the “application/x-www-form-urlencoded” type, the “multipart/form-data” type, and the “multipart” major type. CherryPy 3.2 processes these types almost exactly as older versions. Parts are passed as arguments to the page handler using their
Content-Disposition.name
if given, otherwise in a generic “parts” argument. Each such part is either a string, or thePart
itself if it’s a file. (In this case it will havefile
andfilename
attributes, or possibly avalue
attribute). Each Part is itself a subclass of Entity, and has its ownprocess
method andprocessors
dict.There is a separate processor for the “multipart” major type which is more flexible, and simply stores all multipart parts in
request.body.parts
. You can enable it with:cherrypy.request.body.processors['multipart'] = _cpreqbody.process_multipart
in an
on_start_resource
tool.- attempt_charsets = ['utf-8']¶
A list of strings, each of which should be a known encoding.
When the Content-Type of the request body warrants it, each of the given encodings will be tried in order. The first one to successfully decode the entity without raising an error is stored as
entity.charset
. This defaults to['utf-8']
(plus ‘ISO-8859-1’ for “text/*” types, as required by HTTP/1.1), but['us-ascii', 'utf-8']
for multipart parts.
- charset = None¶
The successful decoding; see “attempt_charsets” above.
- content_type = None¶
The value of the Content-Type request header.
If the Entity is part of a multipart payload, this will be the Content-Type given in the MIME headers for this part.
- default_content_type = 'application/x-www-form-urlencoded'¶
This defines a default
Content-Type
to use if no Content-Type header is given.The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missing
Content-Type
header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however, the MIME spec declares that a part with no Content-Type defaults to “text/plain” (seePart
).
- default_proc()[source]¶
Process unknown data as a fallback.
Called if a more-specific processor is not found for the
Content-Type
.
- filename = None¶
The
Content-Disposition.filename
header, if available.
- fp = None¶
The readable socket file object.
- headers = None¶
A dict of request/multipart header names and values.
This is a copy of the
request.headers
for therequest.body
; for multipart parts, it is the set of headers for that part.
- length = None¶
The value of the
Content-Length
header, if provided.
- make_file()[source]¶
Return a file-like object into which the request body will be read.
By default, this will return a TemporaryFile. Override as needed. See also
cherrypy._cpreqbody.Part.maxrambytes
.
- name = None¶
The “name” parameter of the
Content-Disposition
header, if any.
- params = None¶
If the request Content-Type is ‘application/x-www-form-urlencoded’ or multipart, this will be a dict of the params pulled from the entity body; that is, it will be the portion of request.params that come from the message body (sometimes called “POST params”, although they can be sent with various HTTP method verbs). This value is set between the ‘before_request_body’ and ‘before_handler’ hooks (assuming that process_request_body is True).
- part_class¶
The class used for multipart parts.
You can replace this with custom subclasses to alter the processing of multipart parts.
alias of
Part
- parts = None¶
A list of Part instances if
Content-Type
is of major type “multipart”.
- processors = {'application/x-www-form-urlencoded': <function process_urlencoded>, 'multipart': <function process_multipart>, 'multipart/form-data': <function process_multipart_form_data>}¶
A dict of Content-Type names to processor methods.
- class cherrypy._cpreqbody.Part(fp, headers, boundary)[source]¶
Bases:
Entity
A MIME part entity, part of a multipart entity.
- attempt_charsets = ['us-ascii', 'utf-8']¶
A list of strings, each of which should be a known encoding.
When the Content-Type of the request body warrants it, each of the given encodings will be tried in order. The first one to successfully decode the entity without raising an error is stored as
entity.charset
. This defaults to['utf-8']
(plus ‘ISO-8859-1’ for “text/*” types, as required by HTTP/1.1), but['us-ascii', 'utf-8']
for multipart parts.
- boundary = None¶
The MIME multipart boundary.
- default_content_type = 'text/plain'¶
This defines a default
Content-Type
to use if no Content-Type header is given. The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missingContent-Type
header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however (this class), the MIME spec declares that a part with no Content-Type defaults to “text/plain”.
- default_proc()[source]¶
Process unknown data as a fallback.
Called if a more-specific processor is not found for the
Content-Type
.
- maxrambytes = 1000¶
The threshold of bytes after which point the
Part
will store its data in a file (generated bymake_file
) instead of a string. Defaults to 1000, just like thecgi
module in Python’s standard library.
- read_into_file(fp_out=None)[source]¶
Read the request body into fp_out (or make_file() if None).
Return fp_out.
- read_lines_to_boundary(fp_out=None)[source]¶
Read bytes from self.fp and return or write them to a file.
If the ‘fp_out’ argument is None (the default), all bytes read are returned in a single byte string.
If the ‘fp_out’ argument is not None, it must be a file-like object that supports the ‘write’ method; all bytes read will be written to the fp, and that fp is returned.
- class cherrypy._cpreqbody.RequestBody(fp, headers, params=None, request_params=None)[source]¶
Bases:
Entity
The entity of the HTTP request.
- bufsize = 8192¶
The buffer size used when reading the socket.
- default_content_type = ''¶
This defines a default
Content-Type
to use if no Content-Type header is given. The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missingContent-Type
header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however, the MIME spec declares that a part with no Content-Type defaults to “text/plain” (seePart
).
- maxbytes = None¶
Raise
MaxSizeExceeded
if more bytes than this are read from the socket.
- class cherrypy._cpreqbody.SizedReader(fp, length, maxbytes, bufsize=8192, has_trailers=False)[source]¶
Bases:
object
A buffered/sized reader.
- read(size=None, fp_out=None)[source]¶
Read bytes from the request body and return or write them to a file.
A number of bytes less than or equal to the ‘size’ argument are read off the socket. The actual number of bytes read are tracked in self.bytes_read. The number may be smaller than ‘size’ when 1) the client sends fewer bytes, 2) the ‘Content-Length’ request header specifies fewer bytes than requested, or 3) the number of bytes read exceeds self.maxbytes (in which case, 413 is raised).
If the ‘fp_out’ argument is None (the default), all bytes read are returned in a single byte string.
If the ‘fp_out’ argument is not None, it must be a file-like object that supports the ‘write’ method; all bytes read will be written to the fp, and None is returned.
- cherrypy._cpreqbody._old_process_multipart(entity)[source]¶
Behavior of 3.2 and lower.
Deprecated and will be changed in 3.3.
- cherrypy._cpreqbody.process_multipart_form_data(entity)[source]¶
Read
multipart/form-data
parts.This function saves them into
entity.parts
orentity.params
.