Author: | Raymond Hettinger |
---|---|
Release: | 3.1a2 |
Date: | April 04, 2009 |
This article explains the new features in Python 3.1, compared to 3.0.
Regular Python dictionaries iterate over key/value pairs in arbitrary order. Over the years, a number of authors have written alternative implementations that remember the order that the keys were originally inserted. Based on the experiences from those implementations, the collections module now has an OrderedDict class.
The OrderedDict API is substantially the same as regular dictionaries but will iterate over keys and values in a guaranteed order depending on when a key was first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.
The standard library now supports use of ordered dictionaries in several modules. The ConfigParser module uses them by default. This lets configuration files be read, modified, and then written back in their original order. The collections module’s namedtuple._asdict() method now returns an ordered dictionary with the values appearing in the same order as the underlying tuple indicies. The json module is being built-out with an object_pairs_hook to allow OrderedDicts to be built by the decoder. Support was also added for third-party tools like PyYAML.
See also
The builtin format() function and the str.format() method use a mini-language that now includes a simple, non-locale aware way to format a number with a thousands separator. That provides a way to humanize a program’s output, improving its professional appearance and readability:
>>> format(Decimal('1234567.89'), ',f')
'1,234,567.89'
The currently supported types are int and decimal.Decimal. Support for float is expected before the beta release. Discussions are underway about how to specify alternative separators like dots, spaces, apostrophes, or underscores. Locale-aware applications should use the existing n format specifier which already has some support for thousands separators.
See also
Some smaller changes made to the core Python language are:
The int() type gained a bit_length method that returns the number of bits necessary to represent its argument in binary:
>>> n = 37
>>> bin(37)
'0b100101'
>>> n.bit_length()
6
>>> n = 2**123-1
>>> n.bit_length()
123
>>> (n+1).bit_length()
124
(Contributed by Fredrik Johansson, Victor Stinner, Raymond Hettinger, and Mark Dickinson; issue 3439.)
Added a collections.Counter class to support convenient counting of unique items in a sequence or iterable:
>>> Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
Counter({'blue': 3, 'red': 2, 'green': 1})
(Contributed by Raymond Hettinger; issue 1696199.)
Add a new module, ttk for access to the Tk themed widget set. The basic idea of ttk is to separate, to the extent possible, the code implementing a widget’s behavior from the code implementing its appearance.
(Contributed by Kevin Walzer and Guilherme Polo; issue 2618 and issue 2983.)
The gzip.GzipFile and bz2.BZ2File classs now support the context manager protocol.
(Contributed by Jacques Frechet; issue 4272.)
The Decimal module now supports two new methods to create a decimal object that from a binary float. The conversion is exact but can sometimes be surprising:
>>> Decimal.from_float(1.1)
Decimal('1.100000000000000088817841970012523233890533447265625')
The long decimal result shows the actual binary fraction being stored for 1.1. The fraction has many digits because 1.1 cannot be exactly represented in binary.
(Contributed by Raymond Hettinger and Mark Dickinson.)
The fields in format() strings can now be automatically numbered:
>>> 'Sir {} of {}'.format('Gallahad', 'Camelot')
'Sir Gallahad of Camelot'
Formerly, the string would have required numbered fields such as: 'Sir {0} of {1}'.
(Contributed by Eric Smith; issue 5237.)
The itertools module grew two new functions. The itertools.combinations_with_replacement() function is one of four for generating combinatorics including permutations and Cartesian products. The itertools.compress() function mimics its namesake from APL. Also, the existing itertools.count() function now has an optional step argument and can accept any type of counting sequence including fractions.Fraction and decimal.Decimal.
(Contributed by Raymond Hettinger.)
collections.namedtuple() now supports a keyword argument rename which lets invalid fieldnames be automatically converted to positional names in the form _0, _1, etc. This is useful when the field names are being created by an external source such as a CSV header, SQL field list, or user input.
(Contributed by Raymond Hettinger; issue 1818.)
round`(x, n) now returns an integer if x is an integer. Previously it returned a float.
(Contributed by Mark Dickinson; issue 4707.)
The re.sub(), re.subn() and re.split() functions now accept a flags parameter.
(Contributed by Gregory Smith.)
The runpy module which supports the -m command line switch now supports the execution of packages by looking for and executing a __main__ submodule when a package name is supplied.
(Contributed by Andi Vajda; issue 4195.)
The pdb module can now access and display source code loaded via zipimport (or any other conformant PEP 302 loader).
(Contributed by Alexander Belopolsky; issue 4201.)
functools.partial objects can now be pickled.
(Suggested by Antoine Pitrou and Jesse Noller. Implemented by Jack Diedrich; issue 5228.)
Add pydoc help topics for symbols so that help('@') works as expected in the interactive environment.
(Contributed by David Laban; issue 4739.)
The unittest module now supports skipping individual tests or classes of tests. And it supports marking a test as a expected failure, a test that is known to be broken, but shouldn’t be counted as a failure on a TestResult.
(Contributed by Benjamin Peterson.)
A new module, importlib was added. It provides a complete, portable, pure Python reference implementation of the import statement and its counterpart, the __import__() function. It represents a substantial step forward in documenting and defining the actions that take place during imports.
(Contributed by Brett Cannon.)
Major performance enhancements have been added:
The new I/O library (as defined in PEP 3116) was mostly written in Python and quickly proved to be a problematic bottleneck in Python 3.0. In Python 3.1, the I/O library has been entirely rewritten in C and is 2 to 20 times faster depending on the task at hand. The pure Python version is still available for experimentation purposes through the _pyio module.
(Contributed by Amaury Forgeot d’Arc and Antoine Pitrou.)
Added a heuristic so that tuples and dicts containing only untrackable objects are not tracked by the garbage collector. This can reduce the size of collections and therefore the garbage collection overhead on long-running programs, depending on their particular use of datatypes.
(Contributed by Antoine Pitrou, issue 4688.)
Enabling a configure option named --with-computed-gotos on compilers that support it (notably: gcc, SunPro, icc), the bytecode evaluation loop is compiled with a new dispatch mechanism which gives speedups of up to 20%, depending on the system, the compiler, and the benchmark.
(Contributed by Antoine Pitrou along with a number of other participants, issue 4753).
The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times faster.
(Contributed by Antoine Pitrou and Amaury Forgeot d’Arc, issue 4868.)
The json module is getting a C extension to substantially improve its performance. The code is expected to be added in-time for the beta release.
(Contributed by Bob Ippolito.)
Integers are now stored internally either in base 2**15 or in base 2**30, the base being determined at build time. Previously, they were always stored in base 2**15. Using base 2**30 gives significant performance improvements on 64-bit machines, but benchmark results on 32-bit machines have been mixed. Therefore, the default is to use base 2**30 on 64-bit machines and base 2**15 on 32-bit machines; on Unix, there’s a new configure option --enable-big-digits that can be used to override this default.
Apart from the performance improvements this change should be invisible to end users, with one exception: for testing and debugging purposes there’s a new structseq sys.int_info that provides information about the internal format, giving the number of bits per digit and the size in bytes of the C type used to store each digit:
>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)
(Contributed by Mark Dickinson; issue 4258.)