Skip to content

Commit

Permalink
Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux…
Browse files Browse the repository at this point in the history
…/kernel/git/gregkh/driver-core

Pull driver core/kobject updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.2-rc1

  There are a number of ACPI patches in here as well, as Rafael said
  they should go through this tree due to the driver core changes they
  required. They have all been acked by the ACPI developers.

  There are also a number of small subsystem-specific changes in here,
  due to some changes to the kobject core code. Those too have all been
  acked by the various subsystem maintainers.

  As for content, it's pretty boring outside of the ACPI changes:
   - spdx cleanups
   - kobject documentation updates
   - default attribute groups for kobjects
   - other minor kobject/driver core fixes

  All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
  kobject: clean up the kobject add documentation a bit more
  kobject: Fix kernel-doc comment first line
  kobject: Remove docstring reference to kset
  firmware_loader: Fix a typo ("syfs" -> "sysfs")
  kobject: fix dereference before null check on kobj
  Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
  init/config: Do not select BUILD_BIN2C for IKCONFIG
  Provide in-kernel headers to make extending kernel easier
  kobject: Improve doc clarity kobject_init_and_add()
  kobject: Improve docs for kobject_add/del
  driver core: platform: Fix the usage of platform device name(pdev->name)
  livepatch: Replace klp_ktype_patch's default_attrs with groups
  cpufreq: schedutil: Replace default_attrs field with groups
  padata: Replace padata_attr_type default_attrs field with groups
  irqdesc: Replace irq_kobj_type's default_attrs field with groups
  net-sysfs: Replace ktype default_attrs field with groups
  block: Replace all ktype default_attrs with groups
  samples/kobject: Replace foo_ktype's default_attrs field with groups
  kobject: Add support for default attribute groups to kobj_type
  driver core: Postpone DMA tear-down until after devres release for probe failure
  ...
  • Loading branch information
torvalds committed May 7, 2019
2 parents 01e5d18 + 70e16a6 commit cf482a4
Show file tree
Hide file tree
Showing 68 changed files with 1,854 additions and 274 deletions.
87 changes: 86 additions & 1 deletion Documentation/ABI/stable/sysfs-devices-node
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,89 @@ Date: December 2009
Contact: Lee Schermerhorn <[email protected]>
Description:
The node's huge page size control/query attributes.
See Documentation/admin-guide/mm/hugetlbpage.rst
See Documentation/admin-guide/mm/hugetlbpage.rst

What: /sys/devices/system/node/nodeX/accessY/
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The node's relationship to other nodes for access class "Y".

What: /sys/devices/system/node/nodeX/accessY/initiators/
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The directory containing symlinks to memory initiator
nodes that have class "Y" access to this target node's
memory. CPUs and other memory initiators in nodes not in
the list accessing this node's memory may have different
performance.

What: /sys/devices/system/node/nodeX/accessY/targets/
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The directory containing symlinks to memory targets that
this initiator node has class "Y" access.

What: /sys/devices/system/node/nodeX/accessY/initiators/read_bandwidth
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
This node's read bandwidth in MB/s when accessed from
nodes found in this access class's linked initiators.

What: /sys/devices/system/node/nodeX/accessY/initiators/read_latency
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
This node's read latency in nanoseconds when accessed
from nodes found in this access class's linked initiators.

What: /sys/devices/system/node/nodeX/accessY/initiators/write_bandwidth
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
This node's write bandwidth in MB/s when accessed from
found in this access class's linked initiators.

What: /sys/devices/system/node/nodeX/accessY/initiators/write_latency
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
This node's write latency in nanoseconds when access
from nodes found in this class's linked initiators.

What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The directory containing attributes for the memory-side cache
level 'Y'.

What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/indexing
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The caches associativity indexing: 0 for direct mapped,
non-zero if indexed.

What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/line_size
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The number of bytes accessed from the next cache level on a
cache miss.

What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/size
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The size of this memory side cache in bytes.

What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/write_policy
Date: December 2018
Contact: Keith Busch <[email protected]>
Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.
169 changes: 169 additions & 0 deletions Documentation/admin-guide/mm/numaperf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
.. _numaperf:

=============
NUMA Locality
=============

Some platforms may have multiple types of memory attached to a compute
node. These disparate memory ranges may share some characteristics, such
as CPU cache coherence, but may have different performance. For example,
different media types and buses affect bandwidth and latency.

A system supports such heterogeneous memory by grouping each memory type
under different domains, or "nodes", based on locality and performance
characteristics. Some memory may share the same node as a CPU, and others
are provided as memory only nodes. While memory only nodes do not provide
CPUs, they may still be local to one or more compute nodes relative to
other nodes. The following diagram shows one such example of two compute
nodes with local memory and a memory only node for each of compute node:

+------------------+ +------------------+
| Compute Node 0 +-----+ Compute Node 1 |
| Local Node0 Mem | | Local Node1 Mem |
+--------+---------+ +--------+---------+
| |
+--------+---------+ +--------+---------+
| Slower Node2 Mem | | Slower Node3 Mem |
+------------------+ +--------+---------+

A "memory initiator" is a node containing one or more devices such as
CPUs or separate memory I/O devices that can initiate memory requests.
A "memory target" is a node containing one or more physical address
ranges accessible from one or more memory initiators.

When multiple memory initiators exist, they may not all have the same
performance when accessing a given memory target. Each initiator-target
pair may be organized into different ranked access classes to represent
this relationship. The highest performing initiator to a given target
is considered to be one of that target's local initiators, and given
the highest access class, 0. Any given target may have one or more
local initiators, and any given initiator may have multiple local
memory targets.

To aid applications matching memory targets with their initiators, the
kernel provides symlinks to each other. The following example lists the
relationship for the access class "0" memory initiators and targets::

# symlinks -v /sys/devices/system/node/nodeX/access0/targets/
relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY

# symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX

A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes' access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.

================
NUMA Performance
================

Applications may wish to consider which node they want their memory to
be allocated from based on the node's performance characteristics. If
the system provides these attributes, the kernel exports them under the
node sysfs hierarchy by appending the attributes directory under the
memory node's access class 0 initiators as follows::

/sys/devices/system/node/nodeY/access0/initiators/

These attributes apply only when accessed from nodes that have the
are linked under the this access's inititiators.

The performance characteristics the kernel provides for the local initiators
are exported are as follows::

# tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/
/sys/devices/system/node/nodeY/access0/initiators/
|-- read_bandwidth
|-- read_latency
|-- write_bandwidth
`-- write_latency

The bandwidth attributes are provided in MiB/second.

The latency attributes are provided in nanoseconds.

The values reported here correspond to the rated latency and bandwidth
for the platform.

==========
NUMA Cache
==========

System memory may be constructed in a hierarchy of elements with various
performance characteristics in order to provide large address space of
slower performing memory cached by a smaller higher performing memory. The
system physical addresses memory initiators are aware of are provided
by the last memory level in the hierarchy. The system meanwhile uses
higher performing memory to transparently cache access to progressively
slower levels.

The term "far memory" is used to denote the last level memory in the
hierarchy. Each increasing cache level provides higher performing
initiator access, and the term "near memory" represents the fastest
cache provided by the system.

This numbering is different than CPU caches where the cache level (ex:
L1, L2, L3) uses the CPU-side view where each increased level is lower
performing. In contrast, the memory cache level is centric to the last
level memory, so the higher numbered cache level corresponds to memory
nearer to the CPU, and further from far memory.

The memory-side caches are not directly addressable by software. When
software accesses a system address, the system will return it from the
near memory cache if it is present. If it is not present, the system
accesses the next level of memory until there is either a hit in that
cache level, or it reaches far memory.

An application does not need to know about caching attributes in order
to use the system. Software may optionally query the memory cache
attributes in order to maximize the performance out of such a setup.
If the system provides a way for the kernel to discover this information,
for example with ACPI HMAT (Heterogeneous Memory Attribute Table),
the kernel will append these attributes to the NUMA node memory target.

When the kernel first registers a memory cache with a node, the kernel
will create the following directory::

/sys/devices/system/node/nodeX/memory_side_cache/

If that directory is not present, the system either does not not provide
a memory-side cache, or that information is not accessible to the kernel.

The attributes for each level of cache is provided under its cache
level index::

/sys/devices/system/node/nodeX/memory_side_cache/indexA/
/sys/devices/system/node/nodeX/memory_side_cache/indexB/
/sys/devices/system/node/nodeX/memory_side_cache/indexC/

Each cache level's directory provides its attributes. For example, the
following shows a single cache level and the attributes available for
software to query::

# tree sys/devices/system/node/node0/memory_side_cache/
/sys/devices/system/node/node0/memory_side_cache/
|-- index1
| |-- indexing
| |-- line_size
| |-- size
| `-- write_policy

The "indexing" will be 0 if it is a direct-mapped cache, and non-zero
for any other indexed based, multi-way associativity.

The "line_size" is the number of bytes accessed from the next cache
level on a miss.

The "size" is the number of bytes provided by this cache level.

The "write_policy" will be 0 for write-back, and non-zero for
write-through caching.

========
See Also
========
.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Section 5.2.27
16 changes: 9 additions & 7 deletions Documentation/filesystems/debugfs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ This call, if successful, will make a directory called name underneath the
indicated parent directory. If parent is NULL, the directory will be
created in the debugfs root. On success, the return value is a struct
dentry pointer which can be used to create files in the directory (and to
clean it up at the end). A NULL return value indicates that something went
wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
kernel has been built without debugfs support and none of the functions
described below will work.
clean it up at the end). An ERR_PTR(-ERROR) return value indicates that
something went wrong. If ERR_PTR(-ENODEV) is returned, that is an
indication that the kernel has been built without debugfs support and none
of the functions described below will work.

The most general way to create a file within a debugfs directory is with:

Expand All @@ -48,8 +48,9 @@ should hold the file, data will be stored in the i_private field of the
resulting inode structure, and fops is a set of file operations which
implement the file's behavior. At a minimum, the read() and/or write()
operations should be provided; others can be included as needed. Again,
the return value will be a dentry pointer to the created file, NULL for
error, or ERR_PTR(-ENODEV) if debugfs support is missing.
the return value will be a dentry pointer to the created file,
ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
missing.

Create a file with an initial size, the following function can be used
instead:
Expand Down Expand Up @@ -214,7 +215,8 @@ can be removed with:

void debugfs_remove(struct dentry *dentry);

The dentry value can be NULL, in which case nothing will be removed.
The dentry value can be NULL or an error value, in which case nothing will
be removed.

Once upon a time, debugfs users were required to remember the dentry
pointer for every debugfs file they created so that all files could be
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kernel/acpi_numa.c
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ static inline int get_cpu_for_acpi_id(u32 uid)
return -EINVAL;
}

static int __init acpi_parse_gicc_pxm(struct acpi_subtable_header *header,
static int __init acpi_parse_gicc_pxm(union acpi_subtable_headers *header,
const unsigned long end)
{
struct acpi_srat_gicc_affinity *pa;
Expand Down
4 changes: 2 additions & 2 deletions arch/arm64/kernel/smp.c
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
}

static int __init
acpi_parse_gic_cpu_interface(struct acpi_subtable_header *header,
acpi_parse_gic_cpu_interface(union acpi_subtable_headers *header,
const unsigned long end)
{
struct acpi_madt_generic_interrupt *processor;
Expand All @@ -595,7 +595,7 @@ acpi_parse_gic_cpu_interface(struct acpi_subtable_header *header,
if (BAD_MADT_GICC_ENTRY(processor, end))
return -EINVAL;

acpi_table_print_madt_entry(header);
acpi_table_print_madt_entry(&header->common);

acpi_map_gic_cpu_interface(processor);

Expand Down
14 changes: 7 additions & 7 deletions arch/ia64/kernel/acpi.c
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ struct acpi_table_madt *acpi_madt __initdata;
static u8 has_8259;

static int __init
acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header,
acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header,
const unsigned long end)
{
struct acpi_madt_local_apic_override *lapic;
Expand All @@ -195,7 +195,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header,
}

static int __init
acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end)
acpi_parse_lsapic(union acpi_subtable_headers *header, const unsigned long end)
{
struct acpi_madt_local_sapic *lsapic;

Expand All @@ -216,7 +216,7 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end)
}

static int __init
acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end)
acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end)
{
struct acpi_madt_local_apic_nmi *lacpi_nmi;

Expand All @@ -230,7 +230,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e
}

static int __init
acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end)
acpi_parse_iosapic(union acpi_subtable_headers * header, const unsigned long end)
{
struct acpi_madt_io_sapic *iosapic;

Expand All @@ -245,7 +245,7 @@ acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end
static unsigned int __initdata acpi_madt_rev;

static int __init
acpi_parse_plat_int_src(struct acpi_subtable_header * header,
acpi_parse_plat_int_src(union acpi_subtable_headers * header,
const unsigned long end)
{
struct acpi_madt_interrupt_source *plintsrc;
Expand Down Expand Up @@ -329,7 +329,7 @@ unsigned int get_cpei_target_cpu(void)
}

static int __init
acpi_parse_int_src_ovr(struct acpi_subtable_header * header,
acpi_parse_int_src_ovr(union acpi_subtable_headers * header,
const unsigned long end)
{
struct acpi_madt_interrupt_override *p;
Expand All @@ -350,7 +350,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header,
}

static int __init
acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end)
acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end)
{
struct acpi_madt_nmi_source *nmi_src;

Expand Down
Loading

0 comments on commit cf482a4

Please sign in to comment.