diff --git a/CREDITS b/CREDITS index e97bea06b59ff12bc827c0661fb57974a6eff4d6..c62dcb3b7e2621d918815a656f2682fda044afcb 100644 --- a/CREDITS +++ b/CREDITS @@ -317,6 +317,14 @@ S: 2322 37th Ave SW S: Seattle, Washington 98126-2010 S: USA +N: Muli Ben-Yehuda +E: mulix@mulix.org +E: muli@il.ibm.com +W: http://www.mulix.org +D: trident OSS sound driver, x86-64 dma-ops and Calgary IOMMU, +D: KVM and Xen bits and other misc. hackery. +S: Haifa, Israel + N: Johannes Berg E: johannes@sipsolutions.net W: http://johannes.sipsolutions.net/ @@ -3344,8 +3352,7 @@ S: Spain N: Linus Torvalds E: torvalds@linux-foundation.org D: Original kernel hacker -S: 12725 SW Millikan Way, Suite 400 -S: Beaverton, Oregon 97005 +S: Portland, Oregon 97005 S: USA N: Marcelo Tosatti diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 1977fab386566e23f501bea233f8f487d16f5c38..5b5aba404aacb69160f0d88301be0a76aea78682 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -89,8 +89,6 @@ cciss.txt - info, major/minor #'s for Compaq's SMART Array Controllers. cdrom/ - directory with information on the CD-ROM drivers that Linux has. -cli-sti-removal.txt - - cli()/sti() removal guide. computone.txt - info on Computone Intelliport II/Plus Multiport Serial Driver. connector/ @@ -361,8 +359,6 @@ telephony/ - directory with info on telephony (e.g. voice over IP) support. time_interpolators.txt - info on time interpolators. -tipar.txt - - information about Parallel link cable for Texas Instruments handhelds. tty.txt - guide to the locking policies of the tty layer. uml/ diff --git a/Documentation/ABI/testing/sysfs-class-regulator b/Documentation/ABI/testing/sysfs-class-regulator new file mode 100644 index 0000000000000000000000000000000000000000..79a4a75b2d2ceb7af8e07c3efa6289c3722d6342 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-regulator @@ -0,0 +1,315 @@ +What: /sys/class/regulator/.../state +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + state. This holds the regulator output state. + + This will be one of the following strings: + + 'enabled' + 'disabled' + 'unknown' + + 'enabled' means the regulator output is ON and is supplying + power to the system. + + 'disabled' means the regulator output is OFF and is not + supplying power to the system.. + + 'unknown' means software cannot determine the state. + + NOTE: this field can be used in conjunction with microvolts + and microamps to determine regulator output levels. + + +What: /sys/class/regulator/.../type +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + type. This holds the regulator type. + + This will be one of the following strings: + + 'voltage' + 'current' + 'unknown' + + 'voltage' means the regulator output voltage can be controlled + by software. + + 'current' means the regulator output current limit can be + controlled by software. + + 'unknown' means software cannot control either voltage or + current limit. + + +What: /sys/class/regulator/.../microvolts +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + microvolts. This holds the regulator output voltage setting + measured in microvolts (i.e. E-6 Volts). + + NOTE: This value should not be used to determine the regulator + output voltage level as this value is the same regardless of + whether the regulator is enabled or disabled. + + +What: /sys/class/regulator/.../microamps +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + microamps. This holds the regulator output current limit + setting measured in microamps (i.e. E-6 Amps). + + NOTE: This value should not be used to determine the regulator + output current level as this value is the same regardless of + whether the regulator is enabled or disabled. + + +What: /sys/class/regulator/.../opmode +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + opmode. This holds the regulator operating mode setting. + + The opmode value can be one of the following strings: + + 'fast' + 'normal' + 'idle' + 'standby' + 'unknown' + + The modes are described in include/linux/regulator/regulator.h + + NOTE: This value should not be used to determine the regulator + output operating mode as this value is the same regardless of + whether the regulator is enabled or disabled. + + +What: /sys/class/regulator/.../min_microvolts +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + min_microvolts. This holds the minimum safe working regulator + output voltage setting for this domain measured in microvolts. + + NOTE: this will return the string 'constraint not defined' if + the power domain has no min microvolts constraint defined by + platform code. + + +What: /sys/class/regulator/.../max_microvolts +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + max_microvolts. This holds the maximum safe working regulator + output voltage setting for this domain measured in microvolts. + + NOTE: this will return the string 'constraint not defined' if + the power domain has no max microvolts constraint defined by + platform code. + + +What: /sys/class/regulator/.../min_microamps +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + min_microamps. This holds the minimum safe working regulator + output current limit setting for this domain measured in + microamps. + + NOTE: this will return the string 'constraint not defined' if + the power domain has no min microamps constraint defined by + platform code. + + +What: /sys/class/regulator/.../max_microamps +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + max_microamps. This holds the maximum safe working regulator + output current limit setting for this domain measured in + microamps. + + NOTE: this will return the string 'constraint not defined' if + the power domain has no max microamps constraint defined by + platform code. + + +What: /sys/class/regulator/.../num_users +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + num_users. This holds the number of consumer devices that + have called regulator_enable() on this regulator. + + +What: /sys/class/regulator/.../requested_microamps +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + requested_microamps. This holds the total requested load + current in microamps for this regulator from all its consumer + devices. + + +What: /sys/class/regulator/.../parent +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Some regulator directories will contain a link called parent. + This points to the parent or supply regulator if one exists. + +What: /sys/class/regulator/.../suspend_mem_microvolts +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_mem_microvolts. This holds the regulator output + voltage setting for this domain measured in microvolts when + the system is suspended to memory. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to memory voltage defined by + platform code. + +What: /sys/class/regulator/.../suspend_disk_microvolts +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_disk_microvolts. This holds the regulator output + voltage setting for this domain measured in microvolts when + the system is suspended to disk. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to disk voltage defined by + platform code. + +What: /sys/class/regulator/.../suspend_standby_microvolts +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_standby_microvolts. This holds the regulator output + voltage setting for this domain measured in microvolts when + the system is suspended to standby. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to standby voltage defined by + platform code. + +What: /sys/class/regulator/.../suspend_mem_mode +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_mem_mode. This holds the regulator operating mode + setting for this domain when the system is suspended to + memory. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to memory mode defined by + platform code. + +What: /sys/class/regulator/.../suspend_disk_mode +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_disk_mode. This holds the regulator operating mode + setting for this domain when the system is suspended to disk. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to disk mode defined by + platform code. + +What: /sys/class/regulator/.../suspend_standby_mode +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_standby_mode. This holds the regulator operating mode + setting for this domain when the system is suspended to + standby. + + NOTE: this will return the string 'not defined' if + the power domain has no suspend to standby mode defined by + platform code. + +What: /sys/class/regulator/.../suspend_mem_state +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_mem_state. This holds the regulator operating state + when suspended to memory. + + This will be one of the following strings: + + 'enabled' + 'disabled' + 'not defined' + +What: /sys/class/regulator/.../suspend_disk_state +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_disk_state. This holds the regulator operating state + when suspended to disk. + + This will be one of the following strings: + + 'enabled' + 'disabled' + 'not defined' + +What: /sys/class/regulator/.../suspend_standby_state +Date: May 2008 +KernelVersion: 2.6.26 +Contact: Liam Girdwood +Description: + Each regulator directory will contain a field called + suspend_standby_state. This holds the regulator operating + state when suspended to standby. + + This will be one of the following strings: + + 'enabled' + 'disabled' + 'not defined' diff --git a/Documentation/ABI/testing/sysfs-dev b/Documentation/ABI/testing/sysfs-dev new file mode 100644 index 0000000000000000000000000000000000000000..a9f2b8b0530fb193be12ccfb7d0a529bf0ba435d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-dev @@ -0,0 +1,20 @@ +What: /sys/dev +Date: April 2008 +KernelVersion: 2.6.26 +Contact: Dan Williams +Description: The /sys/dev tree provides a method to look up the sysfs + path for a device using the information returned from + stat(2). There are two directories, 'block' and 'char', + beneath /sys/dev containing symbolic links with names of + the form ":". These links point to the + corresponding sysfs path for the given device. + + Example: + $ readlink /sys/dev/block/8:32 + ../../block/sdc + + Entries in /sys/dev/char and /sys/dev/block will be + dynamically created and destroyed as devices enter and + leave the system. + +Users: mdadm diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory new file mode 100644 index 0000000000000000000000000000000000000000..7a16fe1e2270d8e7353b209bd0dad24a5ce9b69e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -0,0 +1,24 @@ +What: /sys/devices/system/memory +Date: June 2008 +Contact: Badari Pulavarty +Description: + The /sys/devices/system/memory contains a snapshot of the + internal state of the kernel memory blocks. Files could be + added or removed dynamically to represent hot-add/remove + operations. + +Users: hotplug memory add/remove tools + https://w3.opensource.ibm.com/projects/powerpc-utils/ + +What: /sys/devices/system/memory/memoryX/removable +Date: June 2008 +Contact: Badari Pulavarty +Description: + The file /sys/devices/system/memory/memoryX/removable + indicates whether this memory block is removable or not. + This is useful for a user-level agent to determine + identify removable sections of the memory before attempting + potentially expensive hot-remove memory operation + +Users: hotplug memory remove tools + https://w3.opensource.ibm.com/projects/powerpc-utils/ diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi index 9470ed9afcc07238077440312c24af3d5ef88ad0..f27be7d1a49f4038951827c583734b1cf1be3125 100644 --- a/Documentation/ABI/testing/sysfs-firmware-acpi +++ b/Documentation/ABI/testing/sysfs-firmware-acpi @@ -29,46 +29,46 @@ Description: $ cd /sys/firmware/acpi/interrupts $ grep . * - error:0 - ff_gbl_lock:0 - ff_pmtimer:0 - ff_pwr_btn:0 - ff_rt_clk:0 - ff_slp_btn:0 - gpe00:0 - gpe01:0 - gpe02:0 - gpe03:0 - gpe04:0 - gpe05:0 - gpe06:0 - gpe07:0 - gpe08:0 - gpe09:174 - gpe0A:0 - gpe0B:0 - gpe0C:0 - gpe0D:0 - gpe0E:0 - gpe0F:0 - gpe10:0 - gpe11:60 - gpe12:0 - gpe13:0 - gpe14:0 - gpe15:0 - gpe16:0 - gpe17:0 - gpe18:0 - gpe19:7 - gpe1A:0 - gpe1B:0 - gpe1C:0 - gpe1D:0 - gpe1E:0 - gpe1F:0 - gpe_all:241 - sci:241 + error: 0 + ff_gbl_lock: 0 enable + ff_pmtimer: 0 invalid + ff_pwr_btn: 0 enable + ff_rt_clk: 2 disable + ff_slp_btn: 0 invalid + gpe00: 0 invalid + gpe01: 0 enable + gpe02: 108 enable + gpe03: 0 invalid + gpe04: 0 invalid + gpe05: 0 invalid + gpe06: 0 enable + gpe07: 0 enable + gpe08: 0 invalid + gpe09: 0 invalid + gpe0A: 0 invalid + gpe0B: 0 invalid + gpe0C: 0 invalid + gpe0D: 0 invalid + gpe0E: 0 invalid + gpe0F: 0 invalid + gpe10: 0 invalid + gpe11: 0 invalid + gpe12: 0 invalid + gpe13: 0 invalid + gpe14: 0 invalid + gpe15: 0 invalid + gpe16: 0 invalid + gpe17: 1084 enable + gpe18: 0 enable + gpe19: 0 invalid + gpe1A: 0 invalid + gpe1B: 0 invalid + gpe1C: 0 invalid + gpe1D: 0 invalid + gpe1E: 0 invalid + gpe1F: 0 invalid + gpe_all: 1192 + sci: 1194 sci - The total number of times the ACPI SCI has claimed an interrupt. @@ -89,6 +89,13 @@ Description: error - an interrupt that can't be accounted for above. + invalid: it's either a wakeup GPE or a GPE/Fixed Event that + doesn't have an event handler. + + disable: the GPE/Fixed Event is valid but disabled. + + enable: the GPE/Fixed Event is valid and enabled. + Root has permission to clear any of these counters. Eg. # echo 0 > gpe11 @@ -97,3 +104,43 @@ Description: None of these counters has an effect on the function of the system, they are simply statistics. + + Besides this, user can also write specific strings to these files + to enable/disable/clear ACPI interrupts in user space, which can be + used to debug some ACPI interrupt storm issues. + + Note that only writting to VALID GPE/Fixed Event is allowed, + i.e. user can only change the status of runtime GPE and + Fixed Event with event handler installed. + + Let's take power button fixed event for example, please kill acpid + and other user space applications so that the machine won't shutdown + when pressing the power button. + # cat ff_pwr_btn + 0 + # press the power button for 3 times; + # cat ff_pwr_btn + 3 + # echo disable > ff_pwr_btn + # cat ff_pwr_btn + disable + # press the power button for 3 times; + # cat ff_pwr_btn + disable + # echo enable > ff_pwr_btn + # cat ff_pwr_btn + 4 + /* + * this is because the status bit is set even if the enable bit is cleared, + * and it triggers an ACPI fixed event when the enable bit is set again + */ + # press the power button for 3 times; + # cat ff_pwr_btn + 7 + # echo disable > ff_pwr_btn + # press the power button for 3 times; + # echo clear > ff_pwr_btn /* clear the status bit */ + # echo disable > ff_pwr_btn + # cat ff_pwr_btn + 7 + diff --git a/Documentation/ABI/testing/sysfs-kernel-mm b/Documentation/ABI/testing/sysfs-kernel-mm new file mode 100644 index 0000000000000000000000000000000000000000..190d523ac159f64c97ae7355f230c46f564e5f6b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm @@ -0,0 +1,6 @@ +What: /sys/kernel/mm +Date: July 2008 +Contact: Nishanth Aravamudan , VM maintainers +Description: + /sys/kernel/mm/ should contain any and all VM + related information in /sys/kernel/. diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-hugepages b/Documentation/ABI/testing/sysfs-kernel-mm-hugepages new file mode 100644 index 0000000000000000000000000000000000000000..e21c00571cf4f082b04f12b168e376c500bd845d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-hugepages @@ -0,0 +1,15 @@ +What: /sys/kernel/mm/hugepages/ +Date: June 2008 +Contact: Nishanth Aravamudan , hugetlb maintainers +Description: + /sys/kernel/mm/hugepages/ contains a number of subdirectories + of the form hugepages-kB, where is the page size + of the hugepages supported by the kernel/CPU combination. + + Under these directories are a number of files: + nr_hugepages + nr_overcommit_hugepages + free_hugepages + surplus_hugepages + resv_hugepages + See Documentation/vm/hugetlbpage.txt for details. diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 6caa146155788c8a67fc940c1879f2faf4fac1bb..1875e502f87205a1dcaf268d0af78127ee336144 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -474,25 +474,29 @@ make a good program). So, you can either get rid of GNU emacs, or change it to use saner values. To do the latter, you can stick the following in your .emacs file: -(defun linux-c-mode () - "C mode with adjusted defaults for use with the Linux kernel." - (interactive) - (c-mode) - (c-set-style "K&R") - (setq tab-width 8) - (setq indent-tabs-mode t) - (setq c-basic-offset 8)) - -This will define the M-x linux-c-mode command. When hacking on a -module, if you put the string -*- linux-c -*- somewhere on the first -two lines, this mode will be automatically invoked. Also, you may want -to add - -(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode) - auto-mode-alist)) - -to your .emacs file if you want to have linux-c-mode switched on -automagically when you edit source files under /usr/src/linux. +(defun c-lineup-arglist-tabs-only (ignored) + "Line up argument lists by tabs, not spaces" + (let* ((anchor (c-langelem-pos c-syntactic-element)) + (column (c-langelem-2nd-pos c-syntactic-element)) + (offset (- (1+ column) anchor)) + (steps (floor offset c-basic-offset))) + (* (max steps 1) + c-basic-offset))) + +(add-hook 'c-mode-hook + (lambda () + (let ((filename (buffer-file-name))) + ;; Enable kernel mode for the appropriate files + (when (and filename + (string-match "~/src/linux-trees" filename)) + (setq indent-tabs-mode t) + (c-set-style "linux") + (c-set-offset 'arglist-cont-nonempty + '(c-lineup-gcc-asm-reg + c-lineup-arglist-tabs-only)))))) + +This will make emacs go better with the kernel coding style for C +files below ~/src/linux-trees. But even if you fail in getting emacs to do sane formatting, not everything is lost: use "indent". diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 80d150458c80c5ac7f7f5f75b3e6ff8a602a19ca..d8b63d164e41193927af2c7fb41dcb0893f57878 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -298,10 +298,10 @@ recommended that you never use these unless you really know what the cache width is. int -dma_mapping_error(dma_addr_t dma_addr) +dma_mapping_error(struct device *dev, dma_addr_t dma_addr) int -pci_dma_mapping_error(dma_addr_t dma_addr) +pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr) In some circumstances dma_map_single and dma_map_page will fail to create a mapping. A driver can check for these errors by testing the returned diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt index 6d772f84b477c9f5a4698e87d50fb63649af83d3..b768cc0e402b8e6f1cfa8ae0b0f23c4e1c50168a 100644 --- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt @@ -22,3 +22,12 @@ ready and available in memory. The DMA of the "completion indication" could race with data DMA. Mapping the memory used for completion indications with DMA_ATTR_WRITE_BARRIER would prevent the race. +DMA_ATTR_WEAK_ORDERING +---------------------- + +DMA_ATTR_WEAK_ORDERING specifies that reads and writes to the mapping +may be weakly ordered, that is that reads and writes may pass each other. + +Since it is optional for platforms to implement DMA_ATTR_WEAK_ORDERING, +those that do not will simply ignore the attribute and exhibit default +behavior. diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 0eb0d027eb32e139f442eb30e3b92a7560112bed..1d1b34500b69278e5f217c6b521d729a1c78b4f9 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -12,7 +12,7 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ - mac80211.xml debugobjects.xml + mac80211.xml debugobjects.xml sh.xml ### # The build process is as follows (targets): diff --git a/Documentation/DocBook/gadget.tmpl b/Documentation/DocBook/gadget.tmpl index 5a8ffa761e09991bfc59dc97fa01a4deef94e9f7..ea3bc9565e6a7e7ae48a168841c222e07bfd0154 100644 --- a/Documentation/DocBook/gadget.tmpl +++ b/Documentation/DocBook/gadget.tmpl @@ -524,6 +524,44 @@ These utilities include endpoint autoconfiguration. +Composite Device Framework + +The core API is sufficient for writing drivers for composite +USB devices (with more than one function in a given configuration), +and also multi-configuration devices (also more than one function, +but not necessarily sharing a given configuration). +There is however an optional framework which makes it easier to +reuse and combine functions. + + +Devices using this framework provide a struct +usb_composite_driver, which in turn provides one or +more struct usb_configuration instances. +Each such configuration includes at least one +struct usb_function, which packages a user +visible role such as "network link" or "mass storage device". +Management functions may also exist, such as "Device Firmware +Upgrade". + + +!Iinclude/linux/usb/composite.h +!Edrivers/usb/gadget/composite.c + + + +Composite Device Functions + +At this writing, a few of the current gadget drivers have +been converted to this framework. +Near-term plans include converting all of them, except for "gadgetfs". + + +!Edrivers/usb/gadget/f_acm.c +!Edrivers/usb/gadget/f_serial.c + + + + Peripheral Controller Drivers diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 2510763295d09dfbd893e288d1948c9b05feda4d..084f6ad7b7a0a4729b85aea3cc5c3e6eb6557689 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -219,10 +219,10 @@ - Three Main Types of Kernel Locks: Spinlocks, Mutexes and Semaphores + Two Main Types of Kernel Locks: Spinlocks and Mutexes - There are three main types of kernel locks. The fundamental type + There are two main types of kernel locks. The fundamental type is the spinlock (include/asm/spinlock.h), which is a very simple single-holder lock: if you can't get the @@ -239,14 +239,6 @@ can't sleep (see ), and so have to use a spinlock instead. - - The third type is a semaphore - (include/linux/semaphore.h): it - can have more than one holder at any time (the number decided at - initialization time), although it is most commonly used as a - single-holder lock (a mutex). If you can't get a semaphore, your - task will be suspended and later on woken up - just like for mutexes. - Neither type of lock is recursive: see . @@ -278,7 +270,7 @@ - Semaphores still exist, because they are required for + Mutexes still exist, because they are required for synchronization between user contexts, as we will see below. @@ -289,18 +281,17 @@ If you have a data structure which is only ever accessed from - user context, then you can use a simple semaphore - (linux/linux/semaphore.h) to protect it. This - is the most trivial case: you initialize the semaphore to the number - of resources available (usually 1), and call - down_interruptible() to grab the semaphore, and - up() to release it. There is also a - down(), which should be avoided, because it + user context, then you can use a simple mutex + (include/linux/mutex.h) to protect it. This + is the most trivial case: you initialize the mutex. Then you can + call mutex_lock_interruptible() to grab the mutex, + and mutex_unlock() to release it. There is also a + mutex_lock(), which should be avoided, because it will not return if a signal is received. - Example: linux/net/core/netfilter.c allows + Example: net/netfilter/nf_sockopt.c allows registration of new setsockopt() and getsockopt() calls, with nf_register_sockopt(). Registration and @@ -515,7 +506,7 @@ If you are in a process context (any syscall) and want to - lock other process out, use a semaphore. You can take a semaphore + lock other process out, use a mutex. You can take a mutex and sleep (copy_from_user*( or kmalloc(x,GFP_KERNEL)). @@ -662,7 +653,7 @@ SLBH SLBH SLBH -DI +MLI None @@ -692,8 +683,8 @@ spin_lock_bh -DI -down_interruptible +MLI +mutex_lock_interruptible @@ -1310,7 +1301,7 @@ as Alan Cox says, Lock data, not code. There is a coding bug where a piece of code tries to grab a spinlock twice: it will spin forever, waiting for the lock to - be released (spinlocks, rwlocks and semaphores are not + be released (spinlocks, rwlocks and mutexes are not recursive in Linux). This is trivial to diagnose: not a stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. @@ -1335,7 +1326,7 @@ as Alan Cox says, Lock data, not code. This complete lockup is easy to diagnose: on SMP boxes the - watchdog timer or compiling with DEBUG_SPINLOCKS set + watchdog timer or compiling with DEBUG_SPINLOCK set (include/linux/spinlock.h) will show this up immediately when it happens. @@ -1558,7 +1549,7 @@ the amount of locking which needs to be done. Read/Write Lock Variants - Both spinlocks and semaphores have read/write variants: + Both spinlocks and mutexes have read/write variants: rwlock_t and struct rw_semaphore. These divide users into two classes: the readers and the writers. If you are only reading the data, you can get a read lock, but to write to @@ -1681,7 +1672,7 @@ the amount of locking which needs to be done. #include <linux/slab.h> #include <linux/string.h> +#include <linux/rcupdate.h> - #include <linux/semaphore.h> + #include <linux/mutex.h> #include <asm/errno.h> struct object @@ -1913,7 +1904,7 @@ machines due to caching. - put_user() + put_user() @@ -1927,13 +1918,13 @@ machines due to caching. - down_interruptible() and - down() + mutex_lock_interruptible() and + mutex_lock() - There is a down_trylock() which can be + There is a mutex_trylock() which can be used inside interrupt context, as it will not sleep. - up() will also never sleep. + mutex_unlock() will also never sleep. @@ -2023,7 +2014,7 @@ machines due to caching. Prior to 2.5, or when CONFIG_PREEMPT is unset, processes in user context inside the kernel would not - preempt each other (ie. you had that CPU until you have it up, + preempt each other (ie. you had that CPU until you gave it up, except for interrupts). With the addition of CONFIG_PREEMPT in 2.5.4, this changed: when in user context, higher priority tasks can "cut in": spinlocks diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl index e8acd1f034567b217f9e9998f5bc05b4c24e276e..372dec20c8dab6db05fbbcd9e3cdef93f1e783a5 100644 --- a/Documentation/DocBook/kgdb.tmpl +++ b/Documentation/DocBook/kgdb.tmpl @@ -98,6 +98,24 @@ "Kernel debugging" select "KGDB: kernel debugging with remote gdb". + It is advised, but not required that you turn on the + CONFIG_FRAME_POINTER kernel option. This option inserts code to + into the compiled executable which saves the frame information in + registers or on the stack at different points which will allow a + debugger such as gdb to more accurately construct stack back traces + while debugging the kernel. + + + If the architecture that you are using supports the kernel option + CONFIG_DEBUG_RODATA, you should consider turning it off. This + option will prevent the use of software breakpoints because it + marks certain regions of the kernel's memory space as read-only. + If kgdb supports it for the architecture you are using, you can + use hardware breakpoints if you desire to run with the + CONFIG_DEBUG_RODATA option turned on, else you need to turn off + this option. + + Next you should choose one of more I/O drivers to interconnect debugging host and debugged target. Early boot debugging requires a KGDB I/O driver that supports early debugging and the driver must be diff --git a/Documentation/DocBook/procfs-guide.tmpl b/Documentation/DocBook/procfs-guide.tmpl index 1fd6a1ec7591d5f4179cdf2a641f1b055c1f486a..8a5dc6e021ffa8b16b6dc2f997e115e77b1aaf70 100644 --- a/Documentation/DocBook/procfs-guide.tmpl +++ b/Documentation/DocBook/procfs-guide.tmpl @@ -29,12 +29,12 @@ - 1.0  + 1.0 May 30, 2001 Initial revision posted to linux-kernel - 1.1  + 1.1 June 3, 2001 Revised after comments from linux-kernel diff --git a/Documentation/DocBook/s390-drivers.tmpl b/Documentation/DocBook/s390-drivers.tmpl index 4acc73240a6d536ef044bf1dfc5a2abd6ae4296f..95bfc12e5439d572f02e924f887802fd273bcf96 100644 --- a/Documentation/DocBook/s390-drivers.tmpl +++ b/Documentation/DocBook/s390-drivers.tmpl @@ -100,7 +100,7 @@ the hardware structures represented here, please consult the Principles of Operation. -!Iinclude/asm-s390/cio.h +!Iarch/s390/include/asm/cio.h ccw devices @@ -114,7 +114,7 @@ ccw device structure. Device drivers must not bypass those functions or strange side effects may happen. -!Iinclude/asm-s390/ccwdev.h +!Iarch/s390/include/asm/ccwdev.h !Edrivers/s390/cio/device.c !Edrivers/s390/cio/device_ops.c @@ -125,7 +125,7 @@ measurement data which is made available by the channel subsystem for each channel attached device. -!Iinclude/asm-s390/cmb.h +!Iarch/s390/include/asm/cmb.h !Edrivers/s390/cio/cmf.c @@ -142,7 +142,7 @@ ccw group devices -!Iinclude/asm-s390/ccwgroup.h +!Iarch/s390/include/asm/ccwgroup.h !Edrivers/s390/cio/ccwgroup.c diff --git a/Documentation/DocBook/sh.tmpl b/Documentation/DocBook/sh.tmpl new file mode 100644 index 0000000000000000000000000000000000000000..0c3dc4c69dd11f7c6a7b65877420395cad59bc6b --- /dev/null +++ b/Documentation/DocBook/sh.tmpl @@ -0,0 +1,105 @@ + + + + + + SuperH Interfaces Guide + + + + Paul + Mundt + +
+ lethal@linux-sh.org +
+
+
+
+ + + 2008 + Paul Mundt + + + 2008 + Renesas Technology Corp. + + + + + This documentation is free software; you can redistribute + it and/or modify it under the terms of the GNU General Public + License version 2 as published by the Free Software Foundation. + + + + This program is distributed in the hope that it will be + useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + See the GNU General Public License for more details. + + + + You should have received a copy of the GNU General Public + License along with this program; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, + MA 02111-1307 USA + + + + For more details see the file COPYING in the source + distribution of Linux. + + +
+ + + + + Memory Management + + SH-4 + + Store Queue API +!Earch/sh/kernel/cpu/sh4/sq.c + + + + SH-5 + + TLB Interfaces +!Iarch/sh/mm/tlb-sh5.c +!Iarch/sh/include/asm/tlb_64.h + + + + + Clock Framework Extensions +!Iarch/sh/include/asm/clock.h + + + Machine Specific Interfaces + + mach-dreamcast +!Iarch/sh/boards/mach-dreamcast/rtc.c + + + mach-x3proto +!Earch/sh/boards/mach-x3proto/ilsel.c + + + + Busses + + SuperHyway +!Edrivers/sh/superhyway/superhyway.c + + + + Maple +!Edrivers/sh/maple/maple.c + + +
diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl index fdd7f4f887b75ba7bc96b7ab81b9abc7d3348d73..df87d1b93605ac54cf400bd7b8d44b0866d0789a 100644 --- a/Documentation/DocBook/uio-howto.tmpl +++ b/Documentation/DocBook/uio-howto.tmpl @@ -21,6 +21,18 @@ + + 2006-2008 + Hans-Jürgen Koch. + + + + +This documentation is Free Software licensed under the terms of the +GPL version 2. + + + 2006-12-11 @@ -29,6 +41,12 @@ + + 0.5 + 2008-05-22 + hjk + Added description of write() function. + 0.4 2007-11-26 @@ -57,20 +75,9 @@ - + About this document - - -Copyright and License - - Copyright (c) 2006 by Hans-Jürgen Koch. - -This documentation is Free Software licensed under the terms of the -GPL version 2. - - - Translations @@ -189,6 +196,30 @@ interested in translating it, please email me represents the total interrupt count. You can use this number to figure out if you missed some interrupts. + + For some hardware that has more than one interrupt source internally, + but not separate IRQ mask and status registers, there might be + situations where userspace cannot determine what the interrupt source + was if the kernel handler disables them by writing to the chip's IRQ + register. In such a case, the kernel has to disable the IRQ completely + to leave the chip's register untouched. Now the userspace part can + determine the cause of the interrupt, but it cannot re-enable + interrupts. Another cornercase is chips where re-enabling interrupts + is a read-modify-write operation to a combined IRQ status/acknowledge + register. This would be racy if a new interrupt occurred + simultaneously. + + + To address these problems, UIO also implements a write() function. It + is normally not used and can be ignored for hardware that has only a + single interrupt source or has separate IRQ mask and status registers. + If you need it, however, a write to /dev/uioX + will call the irqcontrol() function implemented + by the driver. You have to write a 32-bit value that is usually either + 0 or 1 to disable or enable interrupts. If a driver does not implement + irqcontrol(), write() will + return with -ENOSYS. + To handle interrupts properly, your custom kernel module can @@ -362,6 +393,14 @@ device is actually used. open(), you will probably also want a custom release() function. + + +int (*irqcontrol)(struct uio_info *info, s32 irq_on) +: Optional. If you need to be able to enable or disable +interrupts from userspace by writing to /dev/uioX, +you can implement this function. The parameter irq_on +will be 0 to disable interrupts and 1 to enable them. + diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 619e8caf30db88508a3f2859d140f41ce363f552..c2371c5a98f99b5eaa785bd0affd6c40187e84e3 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -358,7 +358,7 @@ Here is a list of some of the different kernel trees available: - pcmcia, Dominik Brodowski git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git - - SCSI, James Bottomley + - SCSI, James Bottomley git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git - x86, Ingo Molnar diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt index c2321903aa09adea7f64c00d300814cc32229d1f..21bc416d887efdfabfe4c03b22639b225d72909c 100644 --- a/Documentation/Intel-IOMMU.txt +++ b/Documentation/Intel-IOMMU.txt @@ -48,7 +48,7 @@ IOVA generation is pretty generic. We used the same technique as vmalloc() but these are not global address spaces, but separate for each domain. Different DMA engines may support different number of domains. -We also allocate gaurd pages with each mapping, so we can attempt to catch +We also allocate guard pages with each mapping, so we can attempt to catch any overflow that might happen. @@ -112,4 +112,4 @@ TBD - For compatibility testing, could use unity map domain for all devices, just provide a 1-1 for all useful memory under a single domain for all devices. -- API for paravirt ops for abstracting functionlity for VMM folks. +- API for paravirt ops for abstracting functionality for VMM folks. diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt index c64158ecde4371bd79b7fc9f43753fee50084546..a6d32e65d222bbea68cba1c133db87c459c99dec 100644 --- a/Documentation/RCU/NMI-RCU.txt +++ b/Documentation/RCU/NMI-RCU.txt @@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed not to return until all ongoing NMI handlers exit. It is therefore safe to free up the handler's data as soon as synchronize_sched() returns. +Important note: for this to work, the architecture in question must +invoke irq_enter() and irq_exit() on NMI entry and exit, respectively. + Answer to Quick Quiz diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index 39ad8f56783a0faded737756afe6688bb6ae76be..9f711d2df91b00bd85f9d97debfa967c136924c5 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt @@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly structured data, such as the matrices used in scientific programs, and is thus inapplicable to most data structures in operating-system kernels. +In 1992, Henry (now Alexia) Massalin completed a dissertation advising +parallel programmers to defer processing when feasible to simplify +synchronization. RCU makes extremely heavy use of this advice. + In 1993, Jacobson [Jacobson93] verbally described what is perhaps the simplest deferred-free technique: simply waiting a fixed amount of time before freeing blocks awaiting deferred free. Jacobson did not describe @@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c], Robert Olsson described an RCU-protected trie-hash combination [RobertOlsson2006a]. +2007 saw the journal version of the award-winning RCU paper from 2006 +[ThomasEHart2007a], as well as a paper demonstrating use of Promela +and Spin to mechanically verify an optimization to Oleg Nesterov's +QRCU [PaulEMcKenney2007QRCUspin], a design document describing +preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part +LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally, +PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI]. Bibtex Entries @@ -202,6 +213,20 @@ Bibtex Entries ,Year="1991" } +@phdthesis{HMassalinPhD +,author="H. Massalin" +,title="Synthesis: An Efficient Implementation of Fundamental Operating +System Services" +,school="Columbia University" +,address="New York, NY" +,year="1992" +,annotation=" + Mondo optimizing compiler. + Wait-free stuff. + Good advice: defer work to avoid synchronization. +" +} + @unpublished{Jacobson93 ,author="Van Jacobson" ,title="Avoid Read-Side Locking Via Delayed Free" @@ -635,3 +660,86 @@ Revised: " } +@unpublished{PaulEMcKenney2007PreemptibleRCU +,Author="Paul E. McKenney" +,Title="The design of preemptible read-copy-update" +,month="October" +,day="8" +,year="2007" +,note="Available: +\url{http://lwn.net/Articles/253651/} +[Viewed October 25, 2007]" +,annotation=" + LWN article describing the design of preemptible RCU. +" +} + +######################################################################## +# +# "What is RCU?" LWN series. +# + +@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally +,Author="Paul E. McKenney and Jonathan Walpole" +,Title="What is {RCU}, Fundamentally?" +,month="December" +,day="17" +,year="2007" +,note="Available: +\url{http://lwn.net/Articles/262464/} +[Viewed December 27, 2007]" +,annotation=" + Lays out the three basic components of RCU: (1) publish-subscribe, + (2) wait for pre-existing readers to complete, and (2) maintain + multiple versions. +" +} + +@unpublished{PaulEMcKenney2008WhatIsRCUUsage +,Author="Paul E. McKenney" +,Title="What is {RCU}? Part 2: Usage" +,month="January" +,day="4" +,year="2008" +,note="Available: +\url{http://lwn.net/Articles/263130/} +[Viewed January 4, 2008]" +,annotation=" + Lays out six uses of RCU: + 1. RCU is a Reader-Writer Lock Replacement + 2. RCU is a Restricted Reference-Counting Mechanism + 3. RCU is a Bulk Reference-Counting Mechanism + 4. RCU is a Poor Man's Garbage Collector + 5. RCU is a Way of Providing Existence Guarantees + 6. RCU is a Way of Waiting for Things to Finish +" +} + +@unpublished{PaulEMcKenney2008WhatIsRCUAPI +,Author="Paul E. McKenney" +,Title="{RCU} part 3: the {RCU} {API}" +,month="January" +,day="17" +,year="2008" +,note="Available: +\url{http://lwn.net/Articles/264090/} +[Viewed January 10, 2008]" +,annotation=" + Gives an overview of the Linux-kernel RCU API and a brief annotated RCU + bibliography. +" +} + +@article{DinakarGuniguntala2008IBMSysJ +,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole" +,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}" +,Year="2008" +,Month="April" +,journal="IBM Systems Journal" +,volume="47" +,number="2" +,pages="@@-@@" +,annotation=" + RCU, realtime RCU, sleepable RCU, performance. +" +} diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 42b01bc2e1b4f01f414b340a6204fb3ea1087953..cf5562cbe35642834f28d6fca3409dffebb5c0c4 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome! detailed performance measurements show that RCU is nonetheless the right tool for the job. - The other exception would be where performance is not an issue, - and RCU provides a simpler implementation. An example of this - situation is the dynamic NMI code in the Linux 2.6 kernel, - at least on architectures where NMIs are rare. + Another exception is where performance is not an issue, and RCU + provides a simpler implementation. An example of this situation + is the dynamic NMI code in the Linux 2.6 kernel, at least on + architectures where NMIs are rare. + + Yet another exception is where the low real-time latency of RCU's + read-side primitives is critically important. 1. Does the update code have proper mutual exclusion? @@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome! 2. Do the RCU read-side critical sections make proper use of rcu_read_lock() and friends? These primitives are needed - to suppress preemption (or bottom halves, in the case of - rcu_read_lock_bh()) in the read-side critical sections, - and are also an excellent aid to readability. + to prevent grace periods from ending prematurely, which + could result in data being unceremoniously freed out from + under your read-side code, which can greatly increase the + actuarial risk of your kernel. As a rough rule of thumb, any dereference of an RCU-protected pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() @@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome! be running while updates are in progress. There are a number of ways to handle this concurrency, depending on the situation: - a. Make updates appear atomic to readers. For example, + a. Use the RCU variants of the list and hlist update + primitives to add, remove, and replace elements on an + RCU-protected list. Alternatively, use the RCU-protected + trees that have been added to the Linux kernel. + + This is almost always the best approach. + + b. Proceed as in (a) above, but also maintain per-element + locks (that are acquired by both readers and writers) + that guard per-element state. Of course, fields that + the readers refrain from accessing can be guarded by the + update-side lock. + + This works quite well, also. + + c. Make updates appear atomic to readers. For example, pointer updates to properly aligned fields will appear atomic, as will individual atomic primitives. Operations performed under a lock and sequences of multiple atomic primitives will -not- appear to be atomic. - This is almost always the best approach. + This can work, but is starting to get a bit tricky. - b. Carefully order the updates and the reads so that + d. Carefully order the updates and the reads so that readers see valid data at all phases of the update. This is often more difficult than it sounds, especially given modern CPUs' tendency to reorder memory references. @@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome! when publicizing a pointer to a structure that can be traversed by an RCU read-side critical section. -5. If call_rcu(), or a related primitive such as call_rcu_bh(), - is used, the callback function must be written to be called - from softirq context. In particular, it cannot block. +5. If call_rcu(), or a related primitive such as call_rcu_bh() or + call_rcu_sched(), is used, the callback function must be + written to be called from softirq context. In particular, + it cannot block. 6. Since synchronize_rcu() can block, it cannot be called from - any sort of irq context. + any sort of irq context. Ditto for synchronize_sched() and + synchronize_srcu(). 7. If the updater uses call_rcu(), then the corresponding readers must use rcu_read_lock() and rcu_read_unlock(). If the updater uses call_rcu_bh(), then the corresponding readers must use - rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up - will result in confusion and broken kernels. + rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater + uses call_rcu_sched(), then the corresponding readers must + disable preemption. Mixing things up will result in confusion + and broken kernels. One exception to this rule: rcu_read_lock() and rcu_read_unlock() may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() @@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome! such cases is a must, of course! And the jury is still out on whether the increased speed is worth it. -8. Although synchronize_rcu() is a bit slower than is call_rcu(), - it usually results in simpler code. So, unless update - performance is critically important or the updaters cannot block, +8. Although synchronize_rcu() is slower than is call_rcu(), it + usually results in simpler code. So, unless update performance + is critically important or the updaters cannot block, synchronize_rcu() should be used in preference to call_rcu(). An especially important property of the synchronize_rcu() @@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome! number of updates per grace period. 9. All RCU list-traversal primitives, which include - list_for_each_rcu(), list_for_each_entry_rcu(), + rcu_dereference(), list_for_each_rcu(), list_for_each_entry_rcu(), list_for_each_continue_rcu(), and list_for_each_safe_rcu(), - must be within an RCU read-side critical section. RCU + must be either within an RCU read-side critical section or + must be protected by appropriate update-side locks. RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), or by similar primitives such as rcu_read_lock_bh() and rcu_read_unlock_bh(). - Use of the _rcu() list-traversal primitives outside of an - RCU read-side critical section causes no harm other than - a slight performance degradation on Alpha CPUs. It can - also be quite helpful in reducing code bloat when common - code is shared between readers and updaters. + The reason that it is permissible to use RCU list-traversal + primitives when the update-side lock is held is that doing so + can be quite helpful in reducing code bloat when common code is + shared between readers and updaters. 10. Conversely, if you are in an RCU read-side critical section, - you -must- use the "_rcu()" variants of the list macros. - Failing to do so will break Alpha and confuse people reading - your code. + and you don't hold the appropriate update-side lock, you -must- + use the "_rcu()" variants of the list macros. Failing to do so + will break Alpha and confuse people reading your code. 11. Note that synchronize_rcu() -only- guarantees to wait until all currently executing rcu_read_lock()-protected RCU read-side @@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome! must use whatever locking or other synchronization is required to safely access and/or modify that data structure. + RCU callbacks are -usually- executed on the same CPU that executed + the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(), + but are by -no- means guaranteed to be. For example, if a given + CPU goes offline while having an RCU callback pending, then that + RCU callback will execute on some surviving CPU. (If this was + not the case, a self-spawning RCU callback would prevent the + victim CPU from ever going offline.) + 14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) may only be invoked from process context. Unlike other forms of RCU, it -is- permissible to block in an SRCU read-side critical diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 2967a65269d8af7702dbe76f4780e14d91dd9061..a342b6e1cc10e900dcc9453f65f193065fe8ef42 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -10,23 +10,30 @@ status messages via printk(), which can be examined via the dmesg command (perhaps grepping for "torture"). The test is started when the module is loaded, and stops when the module is unloaded. -However, actually setting this config option to "y" results in the system -running the test immediately upon boot, and ending only when the system -is taken down. Normally, one will instead want to build the system -with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control -the test, perhaps using a script similar to the one shown at the end of -this document. Note that you will need CONFIG_MODULE_UNLOAD in order -to be able to end the test. +CONFIG_RCU_TORTURE_TEST_RUNNABLE + +It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will +result in the tests being loaded into the base kernel. In this case, +the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify +whether the RCU torture tests are to be started immediately during +boot or whether the /proc/sys/kernel/rcutorture_runnable file is used +to enable them. This /proc file can be used to repeatedly pause and +restart the tests, regardless of the initial state specified by the +CONFIG_RCU_TORTURE_TEST_RUNNABLE config option. + +You will normally -not- want to start the RCU torture tests during boot +(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing +this can sometimes be useful in finding boot-time bugs. MODULE PARAMETERS This module has the following parameters: -nreaders This is the number of RCU reading threads supported. - The default is twice the number of CPUs. Why twice? - To properly exercise RCU implementations with preemptible - read-side critical sections. +irqreaders Says to invoke RCU readers from irq level. This is currently + done via timers. Defaults to "1" for variants of RCU that + permit this. (Or, more accurately, variants of RCU that do + -not- permit this know to ignore this variable.) nfakewriters This is the number of RCU fake writer threads to run. Fake writer threads repeatedly use the synchronous "wait for @@ -37,6 +44,16 @@ nfakewriters This is the number of RCU fake writer threads to run. Fake to trigger special cases caused by multiple writers, such as the synchronize_srcu() early return optimization. +nreaders This is the number of RCU reading threads supported. + The default is twice the number of CPUs. Why twice? + To properly exercise RCU implementations with preemptible + read-side critical sections. + +shuffle_interval + The number of seconds to keep the test threads affinitied + to a particular subset of the CPUs, defaults to 3 seconds. + Used in conjunction with test_no_idle_hz. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. @@ -44,10 +61,11 @@ stat_interval The number of seconds between output of torture be printed -only- when the module is unloaded, and this is the default. -shuffle_interval - The number of seconds to keep the test threads affinitied - to a particular subset of the CPUs, defaults to 5 seconds. - Used in conjunction with test_no_idle_hz. +stutter The length of time to run the test before pausing for this + same period of time. Defaults to "stutter=5", so as + to run and pause for (roughly) five-second intervals. + Specifying "stutter=0" causes the test to run continuously + without pausing, which is the old default behavior. test_no_idle_hz Whether or not to test the ability of RCU to operate in a kernel that disables the scheduling-clock interrupt to diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index e0d6d99b8f9bb0dc17f647dbd8256aad1282dd2d..e04d643a9f57a802e59057165f719f36b5ebf5ab 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -1,3 +1,11 @@ +Please note that the "What is RCU?" LWN series is an excellent place +to start learning about RCU: + +1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ +2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ +3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ + + What is RCU? RCU is a synchronization mechanism that was added to the Linux kernel @@ -772,26 +780,18 @@ Linux-kernel source code, but it helps to have a full list of the APIs, since there does not appear to be a way to categorize them in docbook. Here is the list, by category. -Markers for RCU read-side critical sections: - - rcu_read_lock - rcu_read_unlock - rcu_read_lock_bh - rcu_read_unlock_bh - srcu_read_lock - srcu_read_unlock - RCU pointer/list traversal: rcu_dereference + list_for_each_entry_rcu + hlist_for_each_entry_rcu + list_for_each_rcu (to be deprecated in favor of list_for_each_entry_rcu) - list_for_each_entry_rcu list_for_each_continue_rcu (to be deprecated in favor of new list_for_each_entry_continue_rcu) - hlist_for_each_entry_rcu -RCU pointer update: +RCU pointer/list update: rcu_assign_pointer list_add_rcu @@ -799,16 +799,36 @@ RCU pointer update: list_del_rcu list_replace_rcu hlist_del_rcu + hlist_add_after_rcu + hlist_add_before_rcu hlist_add_head_rcu + hlist_replace_rcu + list_splice_init_rcu() -RCU grace period: +RCU: Critical sections Grace period Barrier + + rcu_read_lock synchronize_net rcu_barrier + rcu_read_unlock synchronize_rcu + call_rcu + + +bh: Critical sections Grace period Barrier + + rcu_read_lock_bh call_rcu_bh rcu_barrier_bh + rcu_read_unlock_bh + + +sched: Critical sections Grace period Barrier + + [preempt_disable] synchronize_sched rcu_barrier_sched + [and friends] call_rcu_sched + + +SRCU: Critical sections Grace period Barrier + + srcu_read_lock synchronize_srcu N/A + srcu_read_unlock - synchronize_net - synchronize_sched - synchronize_rcu - synchronize_srcu - call_rcu - call_rcu_bh See the comment headers in the source code (or the docbook generated from them) for more information. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 118ca6e9404f47ded06c7ee3a287003d00377e56..f79ad9ff6031aa4ee622d9f0be3e0c1604e2d14b 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -528,7 +528,33 @@ See more details on the proper patch format in the following references. +16) Sending "git pull" requests (from Linus emails) +Please write the git repo address and branch name alone on the same line +so that I can't even by mistake pull from the wrong branch, and so +that a triple-click just selects the whole thing. + +So the proper format is something along the lines of: + + "Please pull from + + git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus + + to get these changes:" + +so that I don't have to hunt-and-peck for the address and inevitably +get it wrong (actually, I've only gotten it wrong a few times, and +checking against the diffstat tells me when I get it wrong, but I'm +just a lot more comfortable when I don't have to "look for" the right +thing to pull, and double-check that I have the right branch-name). + + +Please use "git diff -M --stat --summary" to generate the diffstat: +the -M enables rename detection, and the summary enables a summary of +new/deleted or renamed files. + +With rename detection, the statistics are rather different [...] +because git will notice that a fair number of the changes are renames. ----------------------------------- SECTION 2 - HINTS, TIPS, AND TRICKS diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt index 1443cd71d2631241f13286e544296ef286d91c90..8a12f0730c94da018615aebc8e657daf65eadd80 100644 --- a/Documentation/accounting/delay-accounting.txt +++ b/Documentation/accounting/delay-accounting.txt @@ -11,6 +11,7 @@ the delays experienced by a task while a) waiting for a CPU (while being runnable) b) completion of synchronous block I/O initiated by the task c) swapping in pages +d) memory reclaim and makes these statistics available to userspace through the taskstats interface. @@ -41,7 +42,7 @@ this structure. See include/linux/taskstats.h for a description of the fields pertaining to delay accounting. It will generally be in the form of counters returning the cumulative -delay seen for cpu, sync block I/O, swapin etc. +delay seen for cpu, sync block I/O, swapin, memory reclaim etc. Taking the difference of two successive readings of a given counter (say cpu_delay_total) for a task will give the delay @@ -94,7 +95,9 @@ CPU count real total virtual total delay total 7876 92005750 100000000 24001500 IO count delay total 0 0 -MEM count delay total +SWAP count delay total + 0 0 +RECLAIM count delay total 0 0 Get delays seen in executing a given simple command @@ -108,5 +111,7 @@ CPU count real total virtual total delay total 6 4000250 4000000 0 IO count delay total 0 0 -MEM count delay total +SWAP count delay total + 0 0 +RECLAIM count delay total 0 0 diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index 40121b5cca14cab3993000c53078ef2d8484f001..3f7755f3963f11fd31ba598a8f6199371c87202f 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -196,14 +196,18 @@ void print_delayacct(struct taskstats *t) " %15llu%15llu%15llu%15llu\n" "IO %15s%15s\n" " %15llu%15llu\n" - "MEM %15s%15s\n" + "SWAP %15s%15s\n" + " %15llu%15llu\n" + "RECLAIM %12s%15s\n" " %15llu%15llu\n", "count", "real total", "virtual total", "delay total", t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total, t->cpu_delay_total, "count", "delay total", t->blkio_count, t->blkio_delay_total, - "count", "delay total", t->swapin_count, t->swapin_delay_total); + "count", "delay total", t->swapin_count, t->swapin_delay_total, + "count", "delay total", + t->freepages_count, t->freepages_delay_total); } void task_context_switch_counts(struct taskstats *t) diff --git a/Documentation/accounting/taskstats-struct.txt b/Documentation/accounting/taskstats-struct.txt index cd784f46bf8abefb9ed24257aaf5fcf4df2ecb00..e7512c061c1572f5127b7bb861ded0f0515bc6af 100644 --- a/Documentation/accounting/taskstats-struct.txt +++ b/Documentation/accounting/taskstats-struct.txt @@ -6,7 +6,7 @@ This document contains an explanation of the struct taskstats fields. There are three different groups of fields in the struct taskstats: 1) Common and basic accounting fields - If CONFIG_TASKSTATS is set, the taskstats inteface is enabled and + If CONFIG_TASKSTATS is set, the taskstats interface is enabled and the common fields and basic accounting fields are collected for delivery at do_exit() of a task. 2) Delay accounting fields @@ -26,6 +26,8 @@ There are three different groups of fields in the struct taskstats: 5) Time accounting for SMT machines +6) Extended delay accounting fields for memory reclaim + Future extension should add fields to the end of the taskstats struct, and should not change the relative position of each field within the struct. @@ -170,4 +172,9 @@ struct taskstats { __u64 ac_utimescaled; /* utime scaled on frequency etc */ __u64 ac_stimescaled; /* stime scaled on frequency etc */ __u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */ + +6) Extended delay accounting fields for memory reclaim + /* Delay waiting for memory reclaim */ + __u64 freepages_count; + __u64 freepages_delay_total; } diff --git a/Documentation/arm/IXP4xx b/Documentation/arm/IXP4xx index 43edb4ecf27dbdd7138ea3c969f4c45cbcd720c7..72fbcc4fcab095fd61a886d3d542692a6f94d21e 100644 --- a/Documentation/arm/IXP4xx +++ b/Documentation/arm/IXP4xx @@ -32,7 +32,7 @@ Linux currently supports the following features on the IXP4xx chips: - Flash access (MTD/JFFS) - I2C through GPIO on IXP42x - GPIO for input/output/interrupts - See include/asm-arm/arch-ixp4xx/platform.h for access functions. + See arch/arm/mach-ixp4xx/include/mach/platform.h for access functions. - Timers (watchdog, OS) The following components of the chips are not supported by Linux and diff --git a/Documentation/arm/Interrupts b/Documentation/arm/Interrupts index 0d3dbf1099bcc90cae1b364002ef32e15dcdc06a..f09ab1b90ef1b486bb55273a53590b22fb8193fa 100644 --- a/Documentation/arm/Interrupts +++ b/Documentation/arm/Interrupts @@ -138,14 +138,8 @@ So, what's changed? Set active the IRQ edge(s)/level. This replaces the SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge() - function. Type should be one of the following: - - #define IRQT_NOEDGE (0) - #define IRQT_RISING (__IRQT_RISEDGE) - #define IRQT_FALLING (__IRQT_FALEDGE) - #define IRQT_BOTHEDGE (__IRQT_RISEDGE|__IRQT_FALEDGE) - #define IRQT_LOW (__IRQT_LOWLVL) - #define IRQT_HIGH (__IRQT_HIGHLVL) + function. Type should be one of IRQ_TYPE_xxx defined in + 3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type. @@ -164,7 +158,7 @@ So, what's changed? be re-checked for pending events. (see the Neponset IRQ handler for details). -7. fixup_irq() is gone, as is include/asm-arm/arch-*/irq.h +7. fixup_irq() is gone, as is arch/arm/mach-*/include/mach/irq.h Please note that this will not solve all problems - some of them are hardware based. Mixing level-based and edge-based IRQs on the same diff --git a/Documentation/arm/README b/Documentation/arm/README index 9b9c8226fdc428665fb2e792ce142b77abf2d80d..d98783fbe0c7c8b6ff95a6e61931fc56d4ef7a3b 100644 --- a/Documentation/arm/README +++ b/Documentation/arm/README @@ -79,7 +79,7 @@ Machine/Platform support To this end, we now have arch/arm/mach-$(MACHINE) directories which are designed to house the non-driver files for a particular machine (eg, PCI, memory management, architecture definitions etc). For all future - machines, there should be a corresponding include/asm-arm/arch-$(MACHINE) + machines, there should be a corresponding arch/arm/mach-$(MACHINE)/include/mach directory. @@ -176,7 +176,7 @@ Kernel entry (head.S) class typically based around one or more system on a chip devices, and acts as a natural container around the actual implementations. These classes are given directories - arch/arm/mach- and - include/asm-arm/arch- - which contain the source files to + arch/arm/mach- - which contain the source files to/include/mach support the machine class. This directories also contain any machine specific supporting code. diff --git a/Documentation/arm/Samsung-S3C24XX/GPIO.txt b/Documentation/arm/Samsung-S3C24XX/GPIO.txt index 8caea8c237eec0b9a033414c53ab2ea30b313800..b5d20c0b2ab46d60472797dad8dcf197971d9199 100644 --- a/Documentation/arm/Samsung-S3C24XX/GPIO.txt +++ b/Documentation/arm/Samsung-S3C24XX/GPIO.txt @@ -16,13 +16,13 @@ Introduction Headers ------- - See include/asm-arm/arch-s3c2410/regs-gpio.h for the list + See arch/arm/mach-s3c2410/include/mach/regs-gpio.h for the list of GPIO pins, and the configuration values for them. This - is included by using #include + is included by using #include The GPIO management functions are defined in the hardware - header include/asm-arm/arch-s3c2410/hardware.h which can be - included by #include + header arch/arm/mach-s3c2410/include/mach/hardware.h which can be + included by #include A useful amount of documentation can be found in the hardware header on how the GPIO functions (and others) work. diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt index d04e1e30c47f8ff1a8613a369506e58cf5a41a21..014a8ec4877d296be7864abd82e663fb93e8c8f0 100644 --- a/Documentation/arm/Samsung-S3C24XX/Overview.txt +++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt @@ -36,7 +36,7 @@ Layout in arch/arm/mach-s3c2410 and S3C2440 in arch/arm/mach-s3c2440 Register, kernel and platform data definitions are held in the - include/asm-arm/arch-s3c2410 directory. + arch/arm/mach-s3c2410 directory./include/mach Machines diff --git a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt index b93b68e2b143ec72bed475f6a4203dc3fc485c70..67671eba423125179a6ab7f0e831225dab43bbbb 100644 --- a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt +++ b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt @@ -49,7 +49,7 @@ Board Support Platform Data ------------- - See linux/include/asm-arm/arch-s3c2410/usb-control.h for the + See arch/arm/mach-s3c2410/include/mach/usb-control.h for the descriptions of the platform device data. An implementation can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c . diff --git a/Documentation/bt8xxgpio.txt b/Documentation/bt8xxgpio.txt new file mode 100644 index 0000000000000000000000000000000000000000..d8297e4ebd265eb5dd273bad20162e51d369b25a --- /dev/null +++ b/Documentation/bt8xxgpio.txt @@ -0,0 +1,67 @@ +=============================================================== +== BT8XXGPIO driver == +== == +== A driver for a selfmade cheap BT8xx based PCI GPIO-card == +== == +== For advanced documentation, see == +== http://www.bu3sch.de/btgpio.php == +=============================================================== + + +A generic digital 24-port PCI GPIO card can be built out of an ordinary +Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The +Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily +find them used for low prices on the net. + +The bt8xx chip does have 24 digital GPIO ports. +These ports are accessible via 24 pins on the SMD chip package. + + +============================================== +== How to physically access the GPIO pins == +============================================== + +The are several ways to access these pins. One might unsolder the whole chip +and put it on a custom PCI board, or one might only unsolder each individual +GPIO pin and solder that to some tiny wire. As the chip package really is tiny +there are some advanced soldering skills needed in any case. + +The physical pinouts are drawn in the following ASCII art. +The GPIO pins are marked with G00-G23 + + G G G G G G G G G G G G G G G G G G + 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + --------------------------------------------------------------------------- + --| ^ ^ |-- + --| pin 86 pin 67 |-- + --| |-- + --| pin 61 > |-- G18 + --| |-- G19 + --| |-- G20 + --| |-- G21 + --| |-- G22 + --| pin 56 > |-- G23 + --| |-- + --| Brooktree 878/879 |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| |-- + --| O |-- + --| |-- + --------------------------------------------------------------------------- + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + ^ + This is pin 1 + diff --git a/Documentation/cli-sti-removal.txt b/Documentation/cli-sti-removal.txt deleted file mode 100644 index 60932b02fcb333a8c4bd60e4ac0e8296f33c8784..0000000000000000000000000000000000000000 --- a/Documentation/cli-sti-removal.txt +++ /dev/null @@ -1,133 +0,0 @@ - -#### cli()/sti() removal guide, started by Ingo Molnar - - -as of 2.5.28, five popular macros have been removed on SMP, and -are being phased out on UP: - - cli(), sti(), save_flags(flags), save_flags_cli(flags), restore_flags(flags) - -until now it was possible to protect driver code against interrupt -handlers via a cli(), but from now on other, more lightweight methods -have to be used for synchronization, such as spinlocks or semaphores. - -for example, driver code that used to do something like: - - struct driver_data; - - irq_handler (...) - { - .... - driver_data.finish = 1; - driver_data.new_work = 0; - .... - } - - ... - - ioctl_func (...) - { - ... - cli(); - ... - driver_data.finish = 0; - driver_data.new_work = 2; - ... - sti(); - ... - } - -was SMP-correct because the cli() function ensured that no -interrupt handler (amongst them the above irq_handler()) function -would execute while the cli()-ed section is executing. - -but from now on a more direct method of locking has to be used: - - DEFINE_SPINLOCK(driver_lock); - struct driver_data; - - irq_handler (...) - { - unsigned long flags; - .... - spin_lock_irqsave(&driver_lock, flags); - .... - driver_data.finish = 1; - driver_data.new_work = 0; - .... - spin_unlock_irqrestore(&driver_lock, flags); - .... - } - - ... - - ioctl_func (...) - { - ... - spin_lock_irq(&driver_lock); - ... - driver_data.finish = 0; - driver_data.new_work = 2; - ... - spin_unlock_irq(&driver_lock); - ... - } - -the above code has a number of advantages: - -- the locking relation is easier to understand - actual lock usage - pinpoints the critical sections. cli() usage is too opaque. - Easier to understand means it's easier to debug. - -- it's faster, because spinlocks are faster to acquire than the - potentially heavily-used IRQ lock. Furthermore, your driver does - not have to wait eg. for a big heavy SCSI interrupt to finish, - because the driver_lock spinlock is only used by your driver. - cli() on the other hand was used by many drivers, and extended - the critical section to the whole IRQ handler function - creating - serious lock contention. - - -to make the transition easier, we've still kept the cli(), sti(), -save_flags(), save_flags_cli() and restore_flags() macros defined -on UP systems - but their usage will be phased out until 2.6 is -released. - -drivers that want to disable local interrupts (interrupts on the -current CPU), can use the following five macros: - - local_irq_disable(), local_irq_enable(), local_save_flags(flags), - local_irq_save(flags), local_irq_restore(flags) - -but beware, their meaning and semantics are much simpler, far from -that of the old cli(), sti(), save_flags(flags) and restore_flags(flags) -SMP meaning: - - local_irq_disable() => turn local IRQs off - - local_irq_enable() => turn local IRQs on - - local_save_flags(flags) => save the current IRQ state into flags. The - state can be on or off. (on some - architectures there's even more bits in it.) - - local_irq_save(flags) => save the current IRQ state into flags and - disable interrupts. - - local_irq_restore(flags) => restore the IRQ state from flags. - -(local_irq_save can save both irqs on and irqs off state, and -local_irq_restore can restore into both irqs on and irqs off state.) - -another related change is that synchronize_irq() now takes a parameter: -synchronize_irq(irq). This change too has the purpose of making SMP -synchronization more lightweight - this way you can wait for your own -interrupt handler to finish, no need to wait for other IRQ sources. - - -why were these changes done? The main reason was the architectural burden -of maintaining the cli()/sti() interface - it became a real problem. The -new interrupt system is much more streamlined, easier to understand, debug, -and it's also a bit faster - the same happened to it that will happen to -cli()/sti() using drivers once they convert to spinlocks :-) - diff --git a/Documentation/controllers/memory.txt b/Documentation/controllers/memory.txt index 866b9cd9a9590d6b6b8c3d577038e8d51234082b..9b53d5827361fd647f3388212e648f502defc698 100644 --- a/Documentation/controllers/memory.txt +++ b/Documentation/controllers/memory.txt @@ -242,8 +242,7 @@ rmdir() if there are no tasks. 1. Add support for accounting huge pages (as a separate controller) 2. Make per-cgroup scanner reclaim not-shared pages first 3. Teach controller to account for shared-pages -4. Start reclamation when the limit is lowered -5. Start reclamation in the background when the limit is +4. Start reclamation in the background when the limit is not yet hit but the usage is getting closer Summary diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index dcec0564d04075a9751e462968f956385405f283..5b0cfa67aff9c89ebc6d0f9dd4d467e8be8ae71f 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -122,7 +122,7 @@ around '10000' or more. show_sampling_rate_(min|max): the minimum and maximum sampling rates available that you may set 'sampling_rate' to. -up_threshold: defines what the average CPU usaged between the samplings +up_threshold: defines what the average CPU usage between the samplings of 'sampling_rate' needs to be for the kernel to make a decision on whether it should increase the frequency. For example when it is set to its default value of '80' it means that between the checking diff --git a/Documentation/edac.txt b/Documentation/edac.txt index a5c36842ecef4ec103ff7f44b5499ed9963db0c2..8eda3fb664166726163bf59565c9d52c76af6c79 100644 --- a/Documentation/edac.txt +++ b/Documentation/edac.txt @@ -222,74 +222,9 @@ both csrow2 and csrow3 are populated, this indicates a dual ranked set of DIMMs for channels 0 and 1. -Within each of the 'mc','mcX' and 'csrowX' directories are several +Within each of the 'mcX' and 'csrowX' directories are several EDAC control and attribute files. - -============================================================================ -DIRECTORY 'mc' - -In directory 'mc' are EDAC system overall control and attribute files: - - -Panic on UE control file: - - 'edac_mc_panic_on_ue' - - An uncorrectable error will cause a machine panic. This is usually - desirable. It is a bad idea to continue when an uncorrectable error - occurs - it is indeterminate what was uncorrected and the operating - system context might be so mangled that continuing will lead to further - corruption. If the kernel has MCE configured, then EDAC will never - notice the UE. - - LOAD TIME: module/kernel parameter: panic_on_ue=[0|1] - - RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_panic_on_ue - - -Log UE control file: - - 'edac_mc_log_ue' - - Generate kernel messages describing uncorrectable errors. These errors - are reported through the system message log system. UE statistics - will be accumulated even when UE logging is disabled. - - LOAD TIME: module/kernel parameter: log_ue=[0|1] - - RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ue - - -Log CE control file: - - 'edac_mc_log_ce' - - Generate kernel messages describing correctable errors. These - errors are reported through the system message log system. - CE statistics will be accumulated even when CE logging is disabled. - - LOAD TIME: module/kernel parameter: log_ce=[0|1] - - RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ce - - -Polling period control file: - - 'edac_mc_poll_msec' - - The time period, in milliseconds, for polling for error information. - Too small a value wastes resources. Too large a value might delay - necessary handling of errors and might loose valuable information for - locating the error. 1000 milliseconds (once each second) is the current - default. Systems which require all the bandwidth they can get, may - increase this. - - LOAD TIME: module/kernel parameter: poll_msec=[0|1] - - RUN TIME: echo "1000" >/sys/devices/system/edac/mc/edac_mc_poll_msec - - ============================================================================ 'mcX' DIRECTORIES @@ -392,7 +327,7 @@ Sdram memory scrubbing rate: 'sdram_scrub_rate' Read/Write attribute file that controls memory scrubbing. The scrubbing - rate is set by writing a minimum bandwith in bytes/sec to the attribute + rate is set by writing a minimum bandwidth in bytes/sec to the attribute file. The rate will be translated to an internal value that gives at least the specified rate. @@ -537,7 +472,6 @@ Channel 1 DIMM Label control file: motherboard specific and determination of this information must occur in userland at this time. - ============================================================================ SYSTEM LOGGING @@ -570,7 +504,6 @@ error type, a notice of "no info" and then an optional, driver-specific error message. - ============================================================================ PCI Bus Parity Detection @@ -604,6 +537,74 @@ Enable/Disable PCI Parity checking control file: echo "0" >/sys/devices/system/edac/pci/check_pci_parity +Parity Count: + + 'pci_parity_count' + + This attribute file will display the number of parity errors that + have been detected. + + +============================================================================ +MODULE PARAMETERS + +Panic on UE control file: + + 'edac_mc_panic_on_ue' + + An uncorrectable error will cause a machine panic. This is usually + desirable. It is a bad idea to continue when an uncorrectable error + occurs - it is indeterminate what was uncorrected and the operating + system context might be so mangled that continuing will lead to further + corruption. If the kernel has MCE configured, then EDAC will never + notice the UE. + + LOAD TIME: module/kernel parameter: edac_mc_panic_on_ue=[0|1] + + RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue + + +Log UE control file: + + 'edac_mc_log_ue' + + Generate kernel messages describing uncorrectable errors. These errors + are reported through the system message log system. UE statistics + will be accumulated even when UE logging is disabled. + + LOAD TIME: module/kernel parameter: edac_mc_log_ue=[0|1] + + RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ue + + +Log CE control file: + + 'edac_mc_log_ce' + + Generate kernel messages describing correctable errors. These + errors are reported through the system message log system. + CE statistics will be accumulated even when CE logging is disabled. + + LOAD TIME: module/kernel parameter: edac_mc_log_ce=[0|1] + + RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ce + + +Polling period control file: + + 'edac_mc_poll_msec' + + The time period, in milliseconds, for polling for error information. + Too small a value wastes resources. Too large a value might delay + necessary handling of errors and might loose valuable information for + locating the error. 1000 milliseconds (once each second) is the current + default. Systems which require all the bandwidth they can get, may + increase this. + + LOAD TIME: module/kernel parameter: edac_mc_poll_msec=[0|1] + + RUN TIME: echo "1000" > /sys/module/edac_core/parameters/edac_mc_poll_msec + Panic on PCI PARITY Error: @@ -614,21 +615,13 @@ Panic on PCI PARITY Error: error has been detected. - module/kernel parameter: panic_on_pci_parity=[0|1] + module/kernel parameter: edac_panic_on_pci_pe=[0|1] Enable: - echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity + echo "1" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe Disable: - echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity - - -Parity Count: - - 'pci_parity_count' - - This attribute file will display the number of parity errors that - have been detected. + echo "0" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe diff --git a/Documentation/fb/sh7760fb.txt b/Documentation/fb/sh7760fb.txt new file mode 100644 index 0000000000000000000000000000000000000000..c87bfe5c630a16893aed884a86f6e39506d94182 --- /dev/null +++ b/Documentation/fb/sh7760fb.txt @@ -0,0 +1,131 @@ +SH7760/SH7763 integrated LCDC Framebuffer driver +================================================ + +0. Overwiew +----------- +The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which +supports (in theory) resolutions ranging from 1x1 to 1024x1024, +with color depths ranging from 1 to 16 bits, on STN, DSTN and TFT Panels. + +Caveats: +* Framebuffer memory must be a large chunk allocated at the top + of Area3 (HW requirement). Because of this requirement you should NOT + make the driver a module since at runtime it may become impossible to + get a large enough contiguous chunk of memory. + +* The driver does not support changing resolution while loaded + (displays aren't hotpluggable anyway) + +* Heavy flickering may be observed + a) if you're using 15/16bit color modes at >= 640x480 px resolutions, + b) during PCMCIA (or any other slow bus) activity. + +* Rotation works only 90degress clockwise, and only if horizontal + resolution is <= 320 pixels. + +files: drivers/video/sh7760fb.c + include/asm-sh/sh7760fb.h + Documentation/fb/sh7760fb.txt + +1. Platform setup +----------------- +SH7760: + Video data is fetched via the DMABRG DMA engine, so you have to + configure the SH DMAC for DMABRG mode (write 0x94808080 to the + DMARSRA register somewhere at boot). + + PFC registers PCCR and PCDR must be set to peripheral mode. + (write zeros to both). + +The driver does NOT do the above for you since board setup is, well, job +of the board setup code. + +2. Panel definitions +-------------------- +The LCDC must explicitly be told about the type of LCD panel +attached. Data must be wrapped in a "struct sh7760fb_platdata" and +passed to the driver as platform_data. + +Suggest you take a closer look at the SH7760 Manual, Section 30. +(http://documentation.renesas.com/eng/products/mpumcu/e602291_sh7760.pdf) + +The following code illustrates what needs to be done to +get the framebuffer working on a 640x480 TFT: + +====================== cut here ====================================== + +#include +#include + +/* + * NEC NL6440bc26-01 640x480 TFT + * dotclock 25175 kHz + * Xres 640 Yres 480 + * Htotal 800 Vtotal 525 + * HsynStart 656 VsynStart 490 + * HsynLenn 30 VsynLenn 2 + * + * The linux framebuffer layer does not use the syncstart/synclen + * values but right/left/upper/lower margin values. The comments + * for the x_margin explain how to calculate those from given + * panel sync timings. + */ +static struct fb_videomode nl6448bc26 = { + .name = "NL6448BC26", + .refresh = 60, + .xres = 640, + .yres = 480, + .pixclock = 39683, /* in picoseconds! */ + .hsync_len = 30, + .vsync_len = 2, + .left_margin = 114, /* HTOT - (HSYNSLEN + HSYNSTART) */ + .right_margin = 16, /* HSYNSTART - XRES */ + .upper_margin = 33, /* VTOT - (VSYNLEN + VSYNSTART) */ + .lower_margin = 10, /* VSYNSTART - YRES */ + .sync = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT, + .vmode = FB_VMODE_NONINTERLACED, + .flag = 0, +}; + +static struct sh7760fb_platdata sh7760fb_nl6448 = { + .def_mode = &nl6448bc26, + .ldmtr = LDMTR_TFT_COLOR_16, /* 16bit TFT panel */ + .lddfr = LDDFR_8BPP, /* we want 8bit output */ + .ldpmmr = 0x0070, + .ldpspr = 0x0500, + .ldaclnr = 0, + .ldickr = LDICKR_CLKSRC(LCDC_CLKSRC_EXTERNAL) | + LDICKR_CLKDIV(1), + .rotate = 0, + .novsync = 1, + .blank = NULL, +}; + +/* SH7760: + * 0xFE300800: 256 * 4byte xRGB palette ram + * 0xFE300C00: 42 bytes ctrl registers + */ +static struct resource sh7760_lcdc_res[] = { + [0] = { + .start = 0xFE300800, + .end = 0xFE300CFF, + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = 65, + .end = 65, + .flags = IORESOURCE_IRQ, + }, +}; + +static struct platform_device sh7760_lcdc_dev = { + .dev = { + .platform_data = &sh7760fb_nl6448, + }, + .name = "sh7760-lcdc", + .id = -1, + .resource = sh7760_lcdc_res, + .num_resources = ARRAY_SIZE(sh7760_lcdc_res), +}; + +====================== cut here ====================================== diff --git a/Documentation/fb/tridentfb.txt b/Documentation/fb/tridentfb.txt index 8a6c8a43e6a37803e4fcbd640f1b4578f5a611c6..45d9de5b13a3cb9f8143484d49737ae87254fdf7 100644 --- a/Documentation/fb/tridentfb.txt +++ b/Documentation/fb/tridentfb.txt @@ -3,11 +3,25 @@ Tridentfb is a framebuffer driver for some Trident chip based cards. The following list of chips is thought to be supported although not all are tested: -those from the Image series with Cyber in their names - accelerated -those with Blade in their names (Blade3D,CyberBlade...) - accelerated -the newer CyberBladeXP family - nonaccelerated - -Only PCI/AGP based cards are supported, none of the older Tridents. +those from the TGUI series 9440/96XX and with Cyber in their names +those from the Image series and with Cyber in their names +those with Blade in their names (Blade3D,CyberBlade...) +the newer CyberBladeXP family + +All families are accelerated. Only PCI/AGP based cards are supported, +none of the older Tridents. +The driver supports 8, 16 and 32 bits per pixel depths. +The TGUI family requires a line length to be power of 2 if acceleration +is enabled. This means that range of possible resolutions and bpp is +limited comparing to the range if acceleration is disabled (see list +of parameters below). + +Known bugs: +1. The driver randomly locks up on 3DImage975 chip with acceleration + enabled. The same happens in X11 (Xorg). +2. The ramdac speeds require some more fine tuning. It is possible to + switch resolution which the chip does not support at some depths for + older chips. How to use it? ============== @@ -17,12 +31,11 @@ video=tridentfb The parameters for tridentfb are concatenated with a ':' as in this example. -video=tridentfb:800x600,bpp=16,noaccel +video=tridentfb:800x600-16@75,noaccel The second level parameters that tridentfb understands are: noaccel - turns off acceleration (when it doesn't work for your card) -accel - force text acceleration (for boards which by default are noacceled) fp - use flat panel related stuff crt - assume monitor is present instead of fp @@ -31,21 +44,24 @@ center - for flat panels and resolutions smaller than native size center the image, otherwise use stretch -memsize - integer value in Kb, use if your card's memory size is misdetected. +memsize - integer value in KB, use if your card's memory size is misdetected. look at the driver output to see what it says when initializing. -memdiff - integer value in Kb,should be nonzero if your card reports - more memory than it actually has.For instance mine is 192K less than + +memdiff - integer value in KB, should be nonzero if your card reports + more memory than it actually has. For instance mine is 192K less than detection says in all three BIOS selectable situations 2M, 4M, 8M. Only use if your video memory is taken from main memory hence of - configurable size.Otherwise use memsize. - If in some modes which barely fit the memory you see garbage at the bottom - this might help by not letting change to that mode anymore. + configurable size. Otherwise use memsize. + If in some modes which barely fit the memory you see garbage + at the bottom this might help by not letting change to that mode + anymore. nativex - the width in pixels of the flat panel.If you know it (usually 1024 800 or 1280) and it is not what the driver seems to detect use it. -bpp - bits per pixel (8,16 or 32) -mode - a mode name like 800x600 (as described in Documentation/fb/modedb.txt) +bpp - bits per pixel (8,16 or 32) +mode - a mode name like 800x600-8@75 as described in + Documentation/fb/modedb.txt Using insane values for the above parameters will probably result in driver misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 65a1482457a89ec9d6c5beec89464bc049f3f5e2..c23955404bf5d5056f7ddf448ea52415b6d3adfd 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -47,6 +47,30 @@ Who: Mauro Carvalho Chehab --------------------------- +What: old tuner-3036 i2c driver +When: 2.6.28 +Why: This driver is for VERY old i2c-over-parallel port teletext receiver + boxes. Rather then spending effort on converting this driver to V4L2, + and since it is extremely unlikely that anyone still uses one of these + devices, it was decided to drop it. +Who: Hans Verkuil + Mauro Carvalho Chehab + + --------------------------- + +What: V4L2 dpc7146 driver +When: 2.6.28 +Why: Old driver for the dpc7146 demonstration board that is no longer + relevant. The last time this was tested on actual hardware was + probably around 2002. Since this is a driver for a demonstration + board the decision was made to remove it rather than spending a + lot of effort continually updating this driver to stay in sync + with the latest internal V4L2 or I2C API. +Who: Hans Verkuil + Mauro Carvalho Chehab + +--------------------------- + What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) When: November 2005 Files: drivers/pcmcia/: pcmcia_ioctl.c @@ -138,24 +162,6 @@ Who: Kay Sievers --------------------------- -What: find_task_by_pid -When: 2.6.26 -Why: With pid namespaces, calling this funciton will return the - wrong task when called from inside a namespace. - - The best way to save a task pid and find a task by this - pid later, is to find this task's struct pid pointer (or get - it directly from the task) and call pid_task() later. - - If someone really needs to get a task by its pid_t, then - he most likely needs the find_task_by_vpid() to get the - task from the same namespace as the current task is in, but - this may be not so in general. - -Who: Pavel Emelyanov - ---------------------------- - What: ACPI procfs interface When: July 2008 Why: ACPI sysfs conversion should be finished by January 2008. @@ -300,11 +306,15 @@ Who: ocfs2-devel@oss.oracle.com --------------------------- -What: asm/semaphore.h -When: 2.6.26 -Why: Implementation became generic; users should now include - linux/semaphore.h instead. -Who: Matthew Wilcox +What: SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD, + SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD +When: June 2009 +Why: A newer version of the options have been introduced in 2005 that + removes the limitions of the old API. The sctp library has been + converted to use these new options at the same time. Any user + space app that directly uses the old options should convert to using + the new options. +Who: Vlad Yasevich --------------------------- @@ -314,3 +324,23 @@ Why: This option was introduced just to allow older lm-sensors userspace to keep working over the upgrade to 2.6.26. At the scheduled time of removal fixed lm-sensors (2.x or 3.x) should be readily available. Who: Rene Herman + +--------------------------- + +What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS + (in net/core/net-sysfs.c) +When: After the only user (hal) has seen a release with the patches + for enough time, probably some time in 2010. +Why: Over 1K .text/.data size reduction, data is available in other + ways (ioctls) +Who: Johannes Berg + +--------------------------- + +What: CONFIG_NF_CT_ACCT +When: 2.6.29 +Why: Accounting can now be enabled/disabled without kernel recompilation. + Currently used only to set a default value for a feature that is also + controlled by a kernel/module/sysfs/sysctl parameter. +Who: Krzysztof Piotr Oledzki + diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 8b22d7d8b99166b6db2e037644e9c5d7ff03695e..680fb566b9286bcc71a527c5f75d4b94fc9ac215 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -510,6 +510,7 @@ prototypes: void (*close)(struct vm_area_struct*); int (*fault)(struct vm_area_struct*, struct vm_fault *); int (*page_mkwrite)(struct vm_area_struct *, struct page *); + int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); locking rules: BKL mmap_sem PageLocked(page) @@ -517,6 +518,7 @@ open: no yes close: no yes fault: no yes page_mkwrite: no yes no +access: no yes ->page_mkwrite() is called when a previously read-only page is about to become writeable. The file system is responsible for @@ -525,6 +527,11 @@ taking to lock out truncate, the page range should be verified to be within i_size. The page mapping should also be checked that it is not NULL. + ->access() is called when get_user_pages() fails in +acces_process_vm(), typically used to debug a process through +/proc/pid/mem or ptrace. This function is needed only for +VM_IO | VM_PFNMAP VMAs. + ================================================================================ Dubious stuff diff --git a/Documentation/filesystems/bfs.txt b/Documentation/filesystems/bfs.txt index ea825e178e797b3b8af53d8d6d5fa2c1f1974a0d..78043d5a8fc35f01c343ced6f956aadee03e1026 100644 --- a/Documentation/filesystems/bfs.txt +++ b/Documentation/filesystems/bfs.txt @@ -26,11 +26,11 @@ You can simplify mounting by just typing: this will allocate the first available loopback device (and load loop.o kernel module if necessary) automatically. If the loopback driver is not -loaded automatically, make sure that your kernel is compiled with kmod -support (CONFIG_KMOD) enabled. Beware that umount will not -deallocate /dev/loopN device if /etc/mtab file on your system is a -symbolic link to /proc/mounts. You will need to do it manually using -"-d" switch of losetup(8). Read losetup(8) manpage for more info. +loaded automatically, make sure that you have compiled the module and +that modprobe is functioning. Beware that umount will not deallocate +/dev/loopN device if /etc/mtab file on your system is a symbolic link to +/proc/mounts. You will need to do it manually using "-d" switch of +losetup(8). Read losetup(8) manpage for more info. To create the BFS image under UnixWare you need to find out first which slice contains it. The command prtvtoc(1M) is your friend: diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index 44c97e6accb2655faed63b52e4bc63848571a7ab..fabcb0e00f25d4288c31382699c844d5f3820939 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -311,9 +311,20 @@ the subsystem must be ready for it. [An Example] The best example of these basic concepts is the simple_children -subsystem/group and the simple_child item in configfs_example.c It -shows a trivial object displaying and storing an attribute, and a simple -group creating and destroying these children. +subsystem/group and the simple_child item in configfs_example_explicit.c +and configfs_example_macros.c. It shows a trivial object displaying and +storing an attribute, and a simple group creating and destroying these +children. + +The only difference between configfs_example_explicit.c and +configfs_example_macros.c is how the attributes of the childless item +are defined. The childless item has extended attributes, each with +their own show()/store() operation. This follows a convention commonly +used in sysfs. configfs_example_explicit.c creates these attributes +by explicitly defining the structures involved. Conversely +configfs_example_macros.c uses some convenience macros from configfs.h +to define the attributes. These macros are similar to their sysfs +counterparts. [Hierarchy Navigation and the Subsystem Mutex] diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c deleted file mode 100644 index 25151fd5c2c6f032a0cc95d57d213d7fa4fcdf18..0000000000000000000000000000000000000000 --- a/Documentation/filesystems/configfs/configfs_example.c +++ /dev/null @@ -1,485 +0,0 @@ -/* - * vim: noexpandtab ts=8 sts=0 sw=8: - * - * configfs_example.c - This file is a demonstration module containing - * a number of configfs subsystems. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License as published by the Free Software Foundation; either - * version 2 of the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, - * Boston, MA 021110-1307, USA. - * - * Based on sysfs: - * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel - * - * configfs Copyright (C) 2005 Oracle. All rights reserved. - */ - -#include -#include -#include - -#include - - - -/* - * 01-childless - * - * This first example is a childless subsystem. It cannot create - * any config_items. It just has attributes. - * - * Note that we are enclosing the configfs_subsystem inside a container. - * This is not necessary if a subsystem has no attributes directly - * on the subsystem. See the next example, 02-simple-children, for - * such a subsystem. - */ - -struct childless { - struct configfs_subsystem subsys; - int showme; - int storeme; -}; - -struct childless_attribute { - struct configfs_attribute attr; - ssize_t (*show)(struct childless *, char *); - ssize_t (*store)(struct childless *, const char *, size_t); -}; - -static inline struct childless *to_childless(struct config_item *item) -{ - return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL; -} - -static ssize_t childless_showme_read(struct childless *childless, - char *page) -{ - ssize_t pos; - - pos = sprintf(page, "%d\n", childless->showme); - childless->showme++; - - return pos; -} - -static ssize_t childless_storeme_read(struct childless *childless, - char *page) -{ - return sprintf(page, "%d\n", childless->storeme); -} - -static ssize_t childless_storeme_write(struct childless *childless, - const char *page, - size_t count) -{ - unsigned long tmp; - char *p = (char *) page; - - tmp = simple_strtoul(p, &p, 10); - if (!p || (*p && (*p != '\n'))) - return -EINVAL; - - if (tmp > INT_MAX) - return -ERANGE; - - childless->storeme = tmp; - - return count; -} - -static ssize_t childless_description_read(struct childless *childless, - char *page) -{ - return sprintf(page, -"[01-childless]\n" -"\n" -"The childless subsystem is the simplest possible subsystem in\n" -"configfs. It does not support the creation of child config_items.\n" -"It only has a few attributes. In fact, it isn't much different\n" -"than a directory in /proc.\n"); -} - -static struct childless_attribute childless_attr_showme = { - .attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO }, - .show = childless_showme_read, -}; -static struct childless_attribute childless_attr_storeme = { - .attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR }, - .show = childless_storeme_read, - .store = childless_storeme_write, -}; -static struct childless_attribute childless_attr_description = { - .attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO }, - .show = childless_description_read, -}; - -static struct configfs_attribute *childless_attrs[] = { - &childless_attr_showme.attr, - &childless_attr_storeme.attr, - &childless_attr_description.attr, - NULL, -}; - -static ssize_t childless_attr_show(struct config_item *item, - struct configfs_attribute *attr, - char *page) -{ - struct childless *childless = to_childless(item); - struct childless_attribute *childless_attr = - container_of(attr, struct childless_attribute, attr); - ssize_t ret = 0; - - if (childless_attr->show) - ret = childless_attr->show(childless, page); - return ret; -} - -static ssize_t childless_attr_store(struct config_item *item, - struct configfs_attribute *attr, - const char *page, size_t count) -{ - struct childless *childless = to_childless(item); - struct childless_attribute *childless_attr = - container_of(attr, struct childless_attribute, attr); - ssize_t ret = -EINVAL; - - if (childless_attr->store) - ret = childless_attr->store(childless, page, count); - return ret; -} - -static struct configfs_item_operations childless_item_ops = { - .show_attribute = childless_attr_show, - .store_attribute = childless_attr_store, -}; - -static struct config_item_type childless_type = { - .ct_item_ops = &childless_item_ops, - .ct_attrs = childless_attrs, - .ct_owner = THIS_MODULE, -}; - -static struct childless childless_subsys = { - .subsys = { - .su_group = { - .cg_item = { - .ci_namebuf = "01-childless", - .ci_type = &childless_type, - }, - }, - }, -}; - - -/* ----------------------------------------------------------------- */ - -/* - * 02-simple-children - * - * This example merely has a simple one-attribute child. Note that - * there is no extra attribute structure, as the child's attribute is - * known from the get-go. Also, there is no container for the - * subsystem, as it has no attributes of its own. - */ - -struct simple_child { - struct config_item item; - int storeme; -}; - -static inline struct simple_child *to_simple_child(struct config_item *item) -{ - return item ? container_of(item, struct simple_child, item) : NULL; -} - -static struct configfs_attribute simple_child_attr_storeme = { - .ca_owner = THIS_MODULE, - .ca_name = "storeme", - .ca_mode = S_IRUGO | S_IWUSR, -}; - -static struct configfs_attribute *simple_child_attrs[] = { - &simple_child_attr_storeme, - NULL, -}; - -static ssize_t simple_child_attr_show(struct config_item *item, - struct configfs_attribute *attr, - char *page) -{ - ssize_t count; - struct simple_child *simple_child = to_simple_child(item); - - count = sprintf(page, "%d\n", simple_child->storeme); - - return count; -} - -static ssize_t simple_child_attr_store(struct config_item *item, - struct configfs_attribute *attr, - const char *page, size_t count) -{ - struct simple_child *simple_child = to_simple_child(item); - unsigned long tmp; - char *p = (char *) page; - - tmp = simple_strtoul(p, &p, 10); - if (!p || (*p && (*p != '\n'))) - return -EINVAL; - - if (tmp > INT_MAX) - return -ERANGE; - - simple_child->storeme = tmp; - - return count; -} - -static void simple_child_release(struct config_item *item) -{ - kfree(to_simple_child(item)); -} - -static struct configfs_item_operations simple_child_item_ops = { - .release = simple_child_release, - .show_attribute = simple_child_attr_show, - .store_attribute = simple_child_attr_store, -}; - -static struct config_item_type simple_child_type = { - .ct_item_ops = &simple_child_item_ops, - .ct_attrs = simple_child_attrs, - .ct_owner = THIS_MODULE, -}; - - -struct simple_children { - struct config_group group; -}; - -static inline struct simple_children *to_simple_children(struct config_item *item) -{ - return item ? container_of(to_config_group(item), struct simple_children, group) : NULL; -} - -static struct config_item *simple_children_make_item(struct config_group *group, const char *name) -{ - struct simple_child *simple_child; - - simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL); - if (!simple_child) - return NULL; - - - config_item_init_type_name(&simple_child->item, name, - &simple_child_type); - - simple_child->storeme = 0; - - return &simple_child->item; -} - -static struct configfs_attribute simple_children_attr_description = { - .ca_owner = THIS_MODULE, - .ca_name = "description", - .ca_mode = S_IRUGO, -}; - -static struct configfs_attribute *simple_children_attrs[] = { - &simple_children_attr_description, - NULL, -}; - -static ssize_t simple_children_attr_show(struct config_item *item, - struct configfs_attribute *attr, - char *page) -{ - return sprintf(page, -"[02-simple-children]\n" -"\n" -"This subsystem allows the creation of child config_items. These\n" -"items have only one attribute that is readable and writeable.\n"); -} - -static void simple_children_release(struct config_item *item) -{ - kfree(to_simple_children(item)); -} - -static struct configfs_item_operations simple_children_item_ops = { - .release = simple_children_release, - .show_attribute = simple_children_attr_show, -}; - -/* - * Note that, since no extra work is required on ->drop_item(), - * no ->drop_item() is provided. - */ -static struct configfs_group_operations simple_children_group_ops = { - .make_item = simple_children_make_item, -}; - -static struct config_item_type simple_children_type = { - .ct_item_ops = &simple_children_item_ops, - .ct_group_ops = &simple_children_group_ops, - .ct_attrs = simple_children_attrs, - .ct_owner = THIS_MODULE, -}; - -static struct configfs_subsystem simple_children_subsys = { - .su_group = { - .cg_item = { - .ci_namebuf = "02-simple-children", - .ci_type = &simple_children_type, - }, - }, -}; - - -/* ----------------------------------------------------------------- */ - -/* - * 03-group-children - * - * This example reuses the simple_children group from above. However, - * the simple_children group is not the subsystem itself, it is a - * child of the subsystem. Creation of a group in the subsystem creates - * a new simple_children group. That group can then have simple_child - * children of its own. - */ - -static struct config_group *group_children_make_group(struct config_group *group, const char *name) -{ - struct simple_children *simple_children; - - simple_children = kzalloc(sizeof(struct simple_children), - GFP_KERNEL); - if (!simple_children) - return NULL; - - - config_group_init_type_name(&simple_children->group, name, - &simple_children_type); - - return &simple_children->group; -} - -static struct configfs_attribute group_children_attr_description = { - .ca_owner = THIS_MODULE, - .ca_name = "description", - .ca_mode = S_IRUGO, -}; - -static struct configfs_attribute *group_children_attrs[] = { - &group_children_attr_description, - NULL, -}; - -static ssize_t group_children_attr_show(struct config_item *item, - struct configfs_attribute *attr, - char *page) -{ - return sprintf(page, -"[03-group-children]\n" -"\n" -"This subsystem allows the creation of child config_groups. These\n" -"groups are like the subsystem simple-children.\n"); -} - -static struct configfs_item_operations group_children_item_ops = { - .show_attribute = group_children_attr_show, -}; - -/* - * Note that, since no extra work is required on ->drop_item(), - * no ->drop_item() is provided. - */ -static struct configfs_group_operations group_children_group_ops = { - .make_group = group_children_make_group, -}; - -static struct config_item_type group_children_type = { - .ct_item_ops = &group_children_item_ops, - .ct_group_ops = &group_children_group_ops, - .ct_attrs = group_children_attrs, - .ct_owner = THIS_MODULE, -}; - -static struct configfs_subsystem group_children_subsys = { - .su_group = { - .cg_item = { - .ci_namebuf = "03-group-children", - .ci_type = &group_children_type, - }, - }, -}; - -/* ----------------------------------------------------------------- */ - -/* - * We're now done with our subsystem definitions. - * For convenience in this module, here's a list of them all. It - * allows the init function to easily register them. Most modules - * will only have one subsystem, and will only call register_subsystem - * on it directly. - */ -static struct configfs_subsystem *example_subsys[] = { - &childless_subsys.subsys, - &simple_children_subsys, - &group_children_subsys, - NULL, -}; - -static int __init configfs_example_init(void) -{ - int ret; - int i; - struct configfs_subsystem *subsys; - - for (i = 0; example_subsys[i]; i++) { - subsys = example_subsys[i]; - - config_group_init(&subsys->su_group); - mutex_init(&subsys->su_mutex); - ret = configfs_register_subsystem(subsys); - if (ret) { - printk(KERN_ERR "Error %d while registering subsystem %s\n", - ret, - subsys->su_group.cg_item.ci_namebuf); - goto out_unregister; - } - } - - return 0; - -out_unregister: - for (; i >= 0; i--) { - configfs_unregister_subsystem(example_subsys[i]); - } - - return ret; -} - -static void __exit configfs_example_exit(void) -{ - int i; - - for (i = 0; example_subsys[i]; i++) { - configfs_unregister_subsystem(example_subsys[i]); - } -} - -module_init(configfs_example_init); -module_exit(configfs_example_exit); -MODULE_LICENSE("GPL"); diff --git a/Documentation/filesystems/configfs/configfs_example_explicit.c b/Documentation/filesystems/configfs/configfs_example_explicit.c new file mode 100644 index 0000000000000000000000000000000000000000..d428cc9f07f399141e643fc6d8df8ce88433365b --- /dev/null +++ b/Documentation/filesystems/configfs/configfs_example_explicit.c @@ -0,0 +1,485 @@ +/* + * vim: noexpandtab ts=8 sts=0 sw=8: + * + * configfs_example_explicit.c - This file is a demonstration module + * containing a number of configfs subsystems. It explicitly defines + * each structure without using the helper macros defined in + * configfs.h. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + * + * Based on sysfs: + * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel + * + * configfs Copyright (C) 2005 Oracle. All rights reserved. + */ + +#include +#include +#include + +#include + + + +/* + * 01-childless + * + * This first example is a childless subsystem. It cannot create + * any config_items. It just has attributes. + * + * Note that we are enclosing the configfs_subsystem inside a container. + * This is not necessary if a subsystem has no attributes directly + * on the subsystem. See the next example, 02-simple-children, for + * such a subsystem. + */ + +struct childless { + struct configfs_subsystem subsys; + int showme; + int storeme; +}; + +struct childless_attribute { + struct configfs_attribute attr; + ssize_t (*show)(struct childless *, char *); + ssize_t (*store)(struct childless *, const char *, size_t); +}; + +static inline struct childless *to_childless(struct config_item *item) +{ + return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL; +} + +static ssize_t childless_showme_read(struct childless *childless, + char *page) +{ + ssize_t pos; + + pos = sprintf(page, "%d\n", childless->showme); + childless->showme++; + + return pos; +} + +static ssize_t childless_storeme_read(struct childless *childless, + char *page) +{ + return sprintf(page, "%d\n", childless->storeme); +} + +static ssize_t childless_storeme_write(struct childless *childless, + const char *page, + size_t count) +{ + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + childless->storeme = tmp; + + return count; +} + +static ssize_t childless_description_read(struct childless *childless, + char *page) +{ + return sprintf(page, +"[01-childless]\n" +"\n" +"The childless subsystem is the simplest possible subsystem in\n" +"configfs. It does not support the creation of child config_items.\n" +"It only has a few attributes. In fact, it isn't much different\n" +"than a directory in /proc.\n"); +} + +static struct childless_attribute childless_attr_showme = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO }, + .show = childless_showme_read, +}; +static struct childless_attribute childless_attr_storeme = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR }, + .show = childless_storeme_read, + .store = childless_storeme_write, +}; +static struct childless_attribute childless_attr_description = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO }, + .show = childless_description_read, +}; + +static struct configfs_attribute *childless_attrs[] = { + &childless_attr_showme.attr, + &childless_attr_storeme.attr, + &childless_attr_description.attr, + NULL, +}; + +static ssize_t childless_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + struct childless *childless = to_childless(item); + struct childless_attribute *childless_attr = + container_of(attr, struct childless_attribute, attr); + ssize_t ret = 0; + + if (childless_attr->show) + ret = childless_attr->show(childless, page); + return ret; +} + +static ssize_t childless_attr_store(struct config_item *item, + struct configfs_attribute *attr, + const char *page, size_t count) +{ + struct childless *childless = to_childless(item); + struct childless_attribute *childless_attr = + container_of(attr, struct childless_attribute, attr); + ssize_t ret = -EINVAL; + + if (childless_attr->store) + ret = childless_attr->store(childless, page, count); + return ret; +} + +static struct configfs_item_operations childless_item_ops = { + .show_attribute = childless_attr_show, + .store_attribute = childless_attr_store, +}; + +static struct config_item_type childless_type = { + .ct_item_ops = &childless_item_ops, + .ct_attrs = childless_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct childless childless_subsys = { + .subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "01-childless", + .ci_type = &childless_type, + }, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 02-simple-children + * + * This example merely has a simple one-attribute child. Note that + * there is no extra attribute structure, as the child's attribute is + * known from the get-go. Also, there is no container for the + * subsystem, as it has no attributes of its own. + */ + +struct simple_child { + struct config_item item; + int storeme; +}; + +static inline struct simple_child *to_simple_child(struct config_item *item) +{ + return item ? container_of(item, struct simple_child, item) : NULL; +} + +static struct configfs_attribute simple_child_attr_storeme = { + .ca_owner = THIS_MODULE, + .ca_name = "storeme", + .ca_mode = S_IRUGO | S_IWUSR, +}; + +static struct configfs_attribute *simple_child_attrs[] = { + &simple_child_attr_storeme, + NULL, +}; + +static ssize_t simple_child_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + ssize_t count; + struct simple_child *simple_child = to_simple_child(item); + + count = sprintf(page, "%d\n", simple_child->storeme); + + return count; +} + +static ssize_t simple_child_attr_store(struct config_item *item, + struct configfs_attribute *attr, + const char *page, size_t count) +{ + struct simple_child *simple_child = to_simple_child(item); + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + simple_child->storeme = tmp; + + return count; +} + +static void simple_child_release(struct config_item *item) +{ + kfree(to_simple_child(item)); +} + +static struct configfs_item_operations simple_child_item_ops = { + .release = simple_child_release, + .show_attribute = simple_child_attr_show, + .store_attribute = simple_child_attr_store, +}; + +static struct config_item_type simple_child_type = { + .ct_item_ops = &simple_child_item_ops, + .ct_attrs = simple_child_attrs, + .ct_owner = THIS_MODULE, +}; + + +struct simple_children { + struct config_group group; +}; + +static inline struct simple_children *to_simple_children(struct config_item *item) +{ + return item ? container_of(to_config_group(item), struct simple_children, group) : NULL; +} + +static struct config_item *simple_children_make_item(struct config_group *group, const char *name) +{ + struct simple_child *simple_child; + + simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL); + if (!simple_child) + return ERR_PTR(-ENOMEM); + + config_item_init_type_name(&simple_child->item, name, + &simple_child_type); + + simple_child->storeme = 0; + + return &simple_child->item; +} + +static struct configfs_attribute simple_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *simple_children_attrs[] = { + &simple_children_attr_description, + NULL, +}; + +static ssize_t simple_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[02-simple-children]\n" +"\n" +"This subsystem allows the creation of child config_items. These\n" +"items have only one attribute that is readable and writeable.\n"); +} + +static void simple_children_release(struct config_item *item) +{ + kfree(to_simple_children(item)); +} + +static struct configfs_item_operations simple_children_item_ops = { + .release = simple_children_release, + .show_attribute = simple_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations simple_children_group_ops = { + .make_item = simple_children_make_item, +}; + +static struct config_item_type simple_children_type = { + .ct_item_ops = &simple_children_item_ops, + .ct_group_ops = &simple_children_group_ops, + .ct_attrs = simple_children_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct configfs_subsystem simple_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "02-simple-children", + .ci_type = &simple_children_type, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 03-group-children + * + * This example reuses the simple_children group from above. However, + * the simple_children group is not the subsystem itself, it is a + * child of the subsystem. Creation of a group in the subsystem creates + * a new simple_children group. That group can then have simple_child + * children of its own. + */ + +static struct config_group *group_children_make_group(struct config_group *group, const char *name) +{ + struct simple_children *simple_children; + + simple_children = kzalloc(sizeof(struct simple_children), + GFP_KERNEL); + if (!simple_children) + return ERR_PTR(-ENOMEM); + + config_group_init_type_name(&simple_children->group, name, + &simple_children_type); + + return &simple_children->group; +} + +static struct configfs_attribute group_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *group_children_attrs[] = { + &group_children_attr_description, + NULL, +}; + +static ssize_t group_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[03-group-children]\n" +"\n" +"This subsystem allows the creation of child config_groups. These\n" +"groups are like the subsystem simple-children.\n"); +} + +static struct configfs_item_operations group_children_item_ops = { + .show_attribute = group_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations group_children_group_ops = { + .make_group = group_children_make_group, +}; + +static struct config_item_type group_children_type = { + .ct_item_ops = &group_children_item_ops, + .ct_group_ops = &group_children_group_ops, + .ct_attrs = group_children_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct configfs_subsystem group_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "03-group-children", + .ci_type = &group_children_type, + }, + }, +}; + +/* ----------------------------------------------------------------- */ + +/* + * We're now done with our subsystem definitions. + * For convenience in this module, here's a list of them all. It + * allows the init function to easily register them. Most modules + * will only have one subsystem, and will only call register_subsystem + * on it directly. + */ +static struct configfs_subsystem *example_subsys[] = { + &childless_subsys.subsys, + &simple_children_subsys, + &group_children_subsys, + NULL, +}; + +static int __init configfs_example_init(void) +{ + int ret; + int i; + struct configfs_subsystem *subsys; + + for (i = 0; example_subsys[i]; i++) { + subsys = example_subsys[i]; + + config_group_init(&subsys->su_group); + mutex_init(&subsys->su_mutex); + ret = configfs_register_subsystem(subsys); + if (ret) { + printk(KERN_ERR "Error %d while registering subsystem %s\n", + ret, + subsys->su_group.cg_item.ci_namebuf); + goto out_unregister; + } + } + + return 0; + +out_unregister: + for (; i >= 0; i--) { + configfs_unregister_subsystem(example_subsys[i]); + } + + return ret; +} + +static void __exit configfs_example_exit(void) +{ + int i; + + for (i = 0; example_subsys[i]; i++) { + configfs_unregister_subsystem(example_subsys[i]); + } +} + +module_init(configfs_example_init); +module_exit(configfs_example_exit); +MODULE_LICENSE("GPL"); diff --git a/Documentation/filesystems/configfs/configfs_example_macros.c b/Documentation/filesystems/configfs/configfs_example_macros.c new file mode 100644 index 0000000000000000000000000000000000000000..d8e30a0378aa2bf3540a79da4ff030d402943497 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs_example_macros.c @@ -0,0 +1,448 @@ +/* + * vim: noexpandtab ts=8 sts=0 sw=8: + * + * configfs_example_macros.c - This file is a demonstration module + * containing a number of configfs subsystems. It uses the helper + * macros defined by configfs.h + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + * + * Based on sysfs: + * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel + * + * configfs Copyright (C) 2005 Oracle. All rights reserved. + */ + +#include +#include +#include + +#include + + + +/* + * 01-childless + * + * This first example is a childless subsystem. It cannot create + * any config_items. It just has attributes. + * + * Note that we are enclosing the configfs_subsystem inside a container. + * This is not necessary if a subsystem has no attributes directly + * on the subsystem. See the next example, 02-simple-children, for + * such a subsystem. + */ + +struct childless { + struct configfs_subsystem subsys; + int showme; + int storeme; +}; + +static inline struct childless *to_childless(struct config_item *item) +{ + return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL; +} + +CONFIGFS_ATTR_STRUCT(childless); +#define CHILDLESS_ATTR(_name, _mode, _show, _store) \ +struct childless_attribute childless_attr_##_name = __CONFIGFS_ATTR(_name, _mode, _show, _store) +#define CHILDLESS_ATTR_RO(_name, _show) \ +struct childless_attribute childless_attr_##_name = __CONFIGFS_ATTR_RO(_name, _show); + +static ssize_t childless_showme_read(struct childless *childless, + char *page) +{ + ssize_t pos; + + pos = sprintf(page, "%d\n", childless->showme); + childless->showme++; + + return pos; +} + +static ssize_t childless_storeme_read(struct childless *childless, + char *page) +{ + return sprintf(page, "%d\n", childless->storeme); +} + +static ssize_t childless_storeme_write(struct childless *childless, + const char *page, + size_t count) +{ + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + childless->storeme = tmp; + + return count; +} + +static ssize_t childless_description_read(struct childless *childless, + char *page) +{ + return sprintf(page, +"[01-childless]\n" +"\n" +"The childless subsystem is the simplest possible subsystem in\n" +"configfs. It does not support the creation of child config_items.\n" +"It only has a few attributes. In fact, it isn't much different\n" +"than a directory in /proc.\n"); +} + +CHILDLESS_ATTR_RO(showme, childless_showme_read); +CHILDLESS_ATTR(storeme, S_IRUGO | S_IWUSR, childless_storeme_read, + childless_storeme_write); +CHILDLESS_ATTR_RO(description, childless_description_read); + +static struct configfs_attribute *childless_attrs[] = { + &childless_attr_showme.attr, + &childless_attr_storeme.attr, + &childless_attr_description.attr, + NULL, +}; + +CONFIGFS_ATTR_OPS(childless); +static struct configfs_item_operations childless_item_ops = { + .show_attribute = childless_attr_show, + .store_attribute = childless_attr_store, +}; + +static struct config_item_type childless_type = { + .ct_item_ops = &childless_item_ops, + .ct_attrs = childless_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct childless childless_subsys = { + .subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "01-childless", + .ci_type = &childless_type, + }, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 02-simple-children + * + * This example merely has a simple one-attribute child. Note that + * there is no extra attribute structure, as the child's attribute is + * known from the get-go. Also, there is no container for the + * subsystem, as it has no attributes of its own. + */ + +struct simple_child { + struct config_item item; + int storeme; +}; + +static inline struct simple_child *to_simple_child(struct config_item *item) +{ + return item ? container_of(item, struct simple_child, item) : NULL; +} + +static struct configfs_attribute simple_child_attr_storeme = { + .ca_owner = THIS_MODULE, + .ca_name = "storeme", + .ca_mode = S_IRUGO | S_IWUSR, +}; + +static struct configfs_attribute *simple_child_attrs[] = { + &simple_child_attr_storeme, + NULL, +}; + +static ssize_t simple_child_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + ssize_t count; + struct simple_child *simple_child = to_simple_child(item); + + count = sprintf(page, "%d\n", simple_child->storeme); + + return count; +} + +static ssize_t simple_child_attr_store(struct config_item *item, + struct configfs_attribute *attr, + const char *page, size_t count) +{ + struct simple_child *simple_child = to_simple_child(item); + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + simple_child->storeme = tmp; + + return count; +} + +static void simple_child_release(struct config_item *item) +{ + kfree(to_simple_child(item)); +} + +static struct configfs_item_operations simple_child_item_ops = { + .release = simple_child_release, + .show_attribute = simple_child_attr_show, + .store_attribute = simple_child_attr_store, +}; + +static struct config_item_type simple_child_type = { + .ct_item_ops = &simple_child_item_ops, + .ct_attrs = simple_child_attrs, + .ct_owner = THIS_MODULE, +}; + + +struct simple_children { + struct config_group group; +}; + +static inline struct simple_children *to_simple_children(struct config_item *item) +{ + return item ? container_of(to_config_group(item), struct simple_children, group) : NULL; +} + +static struct config_item *simple_children_make_item(struct config_group *group, const char *name) +{ + struct simple_child *simple_child; + + simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL); + if (!simple_child) + return ERR_PTR(-ENOMEM); + + config_item_init_type_name(&simple_child->item, name, + &simple_child_type); + + simple_child->storeme = 0; + + return &simple_child->item; +} + +static struct configfs_attribute simple_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *simple_children_attrs[] = { + &simple_children_attr_description, + NULL, +}; + +static ssize_t simple_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[02-simple-children]\n" +"\n" +"This subsystem allows the creation of child config_items. These\n" +"items have only one attribute that is readable and writeable.\n"); +} + +static void simple_children_release(struct config_item *item) +{ + kfree(to_simple_children(item)); +} + +static struct configfs_item_operations simple_children_item_ops = { + .release = simple_children_release, + .show_attribute = simple_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations simple_children_group_ops = { + .make_item = simple_children_make_item, +}; + +static struct config_item_type simple_children_type = { + .ct_item_ops = &simple_children_item_ops, + .ct_group_ops = &simple_children_group_ops, + .ct_attrs = simple_children_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct configfs_subsystem simple_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "02-simple-children", + .ci_type = &simple_children_type, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 03-group-children + * + * This example reuses the simple_children group from above. However, + * the simple_children group is not the subsystem itself, it is a + * child of the subsystem. Creation of a group in the subsystem creates + * a new simple_children group. That group can then have simple_child + * children of its own. + */ + +static struct config_group *group_children_make_group(struct config_group *group, const char *name) +{ + struct simple_children *simple_children; + + simple_children = kzalloc(sizeof(struct simple_children), + GFP_KERNEL); + if (!simple_children) + return ERR_PTR(-ENOMEM); + + config_group_init_type_name(&simple_children->group, name, + &simple_children_type); + + return &simple_children->group; +} + +static struct configfs_attribute group_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *group_children_attrs[] = { + &group_children_attr_description, + NULL, +}; + +static ssize_t group_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[03-group-children]\n" +"\n" +"This subsystem allows the creation of child config_groups. These\n" +"groups are like the subsystem simple-children.\n"); +} + +static struct configfs_item_operations group_children_item_ops = { + .show_attribute = group_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations group_children_group_ops = { + .make_group = group_children_make_group, +}; + +static struct config_item_type group_children_type = { + .ct_item_ops = &group_children_item_ops, + .ct_group_ops = &group_children_group_ops, + .ct_attrs = group_children_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct configfs_subsystem group_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "03-group-children", + .ci_type = &group_children_type, + }, + }, +}; + +/* ----------------------------------------------------------------- */ + +/* + * We're now done with our subsystem definitions. + * For convenience in this module, here's a list of them all. It + * allows the init function to easily register them. Most modules + * will only have one subsystem, and will only call register_subsystem + * on it directly. + */ +static struct configfs_subsystem *example_subsys[] = { + &childless_subsys.subsys, + &simple_children_subsys, + &group_children_subsys, + NULL, +}; + +static int __init configfs_example_init(void) +{ + int ret; + int i; + struct configfs_subsystem *subsys; + + for (i = 0; example_subsys[i]; i++) { + subsys = example_subsys[i]; + + config_group_init(&subsys->su_group); + mutex_init(&subsys->su_mutex); + ret = configfs_register_subsystem(subsys); + if (ret) { + printk(KERN_ERR "Error %d while registering subsystem %s\n", + ret, + subsys->su_group.cg_item.ci_namebuf); + goto out_unregister; + } + } + + return 0; + +out_unregister: + for (; i >= 0; i--) { + configfs_unregister_subsystem(example_subsys[i]); + } + + return ret; +} + +static void __exit configfs_example_exit(void) +{ + int i; + + for (i = 0; example_subsys[i]; i++) { + configfs_unregister_subsystem(example_subsys[i]); + } +} + +module_init(configfs_example_init); +module_exit(configfs_example_exit); +MODULE_LICENSE("GPL"); diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt index d0ec45ae4e7dfa4585fa74c0757be4d80dc02bf5..44bd766f2e5d2225488337681a948c160bd2a821 100644 --- a/Documentation/filesystems/nfs-rdma.txt +++ b/Documentation/filesystems/nfs-rdma.txt @@ -5,7 +5,7 @@ ################################################################################ Author: NetApp and Open Grid Computing - Date: April 15, 2008 + Date: May 29, 2008 Table of Contents ~~~~~~~~~~~~~~~~~ @@ -60,16 +60,18 @@ Installation The procedures described in this document have been tested with distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). - - Install nfs-utils-1.1.1 or greater on the client + - Install nfs-utils-1.1.2 or greater on the client - An NFS/RDMA mount point can only be obtained by using the mount.nfs - command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs - you are using, type: + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: - > /sbin/mount.nfs -V + $ /sbin/mount.nfs -V - If the version is less than 1.1.1 or the command does not exist, - then you will need to install the latest version of nfs-utils. + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. Download the latest package from: @@ -77,22 +79,33 @@ Installation Uncompress the package and follow the installation instructions. - If you will not be using GSS and NFSv4, the installation process - can be simplified by disabling these features when running configure: + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: - > ./configure --disable-gss --disable-nfsv4 + $ ./configure --disable-gss --disable-nfsv4 - For more information on this see the package's README and INSTALL files. + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. After building the nfs-utils package, there will be a mount.nfs binary in the utils/mount directory. This binary can be used to initiate NFS v2, v3, - or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. - The standard technique is to create a symlink called mount.nfs4 to mount.nfs. + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. - NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount commmand. + + NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed on the NFS client machine. You do not need this specific version of nfs-utils on the server. Furthermore, only the mount.nfs command from - nfs-utils-1.1.1 is needed on the client. + nfs-utils-1.1.2 is needed on the client. - Install a Linux kernel with NFS/RDMA @@ -156,8 +169,8 @@ Check RDMA and NFS Setup this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel card: - > modprobe ib_mthca - > modprobe ib_ipoib + $ modprobe ib_mthca + $ modprobe ib_ipoib If you are using InfiniBand, make sure there is a Subnet Manager (SM) running on the network. If your IB switch has an embedded SM, you can @@ -166,7 +179,7 @@ Check RDMA and NFS Setup If an SM is running on your network, you should see the following: - > cat /sys/class/infiniband/driverX/ports/1/state + $ cat /sys/class/infiniband/driverX/ports/1/state 4: ACTIVE where driverX is mthca0, ipath5, ehca3, etc. @@ -174,10 +187,10 @@ Check RDMA and NFS Setup To further test the InfiniBand software stack, use IPoIB (this assumes you have two IB hosts named host1 and host2): - host1> ifconfig ib0 a.b.c.x - host2> ifconfig ib0 a.b.c.y - host1> ping a.b.c.y - host2> ping a.b.c.x + host1$ ifconfig ib0 a.b.c.x + host2$ ifconfig ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x For other device types, follow the appropriate procedures. @@ -202,11 +215,11 @@ NFS/RDMA Setup /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) - The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the - cleint's iWARP address(es) for an RNIC. + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the cleint's iWARP address(es) for an RNIC. - NOTE: The "insecure" option must be used because the NFS/RDMA client does not - use a reserved port. + NOTE: The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. Each time a machine boots: @@ -214,43 +227,45 @@ NFS/RDMA Setup For InfiniBand using a Mellanox adapter: - > modprobe ib_mthca - > modprobe ib_ipoib - > ifconfig ib0 a.b.c.d + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ifconfig ib0 a.b.c.d NOTE: use unique addresses for the client and server - Start the NFS server - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), - load the RDMA transport module: + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: - > modprobe svcrdma + $ modprobe svcrdma - Regardless of how the server was built (module or built-in), start the server: + Regardless of how the server was built (module or built-in), start the + server: - > /etc/init.d/nfs start + $ /etc/init.d/nfs start or - > service nfs start + $ service nfs start Instruct the server to listen on the RDMA transport: - > echo rdma 2050 > /proc/fs/nfsd/portlist + $ echo rdma 2050 > /proc/fs/nfsd/portlist - On the client system - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), - load the RDMA client module: + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: - > modprobe xprtrdma.ko + $ modprobe xprtrdma.ko - Regardless of how the client was built (module or built-in), issue the mount.nfs command: + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: - > /path/to/your/mount.nfs :/ /mnt -i -o rdma,port=2050 + $ mount -o rdma,port=2050 :/ /mnt - To verify that the mount is using RDMA, run "cat /proc/mounts" and check the - "proto" field for the given mount. + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. Congratulations! You're using NFS/RDMA! diff --git a/Documentation/filesystems/omfs.txt b/Documentation/filesystems/omfs.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d0d41ff5c65cdc75454d958ad58b367a9e2464c --- /dev/null +++ b/Documentation/filesystems/omfs.txt @@ -0,0 +1,106 @@ +Optimized MPEG Filesystem (OMFS) + +Overview +======== + +OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR +and Rio Karma MP3 player. The filesystem is extent-based, utilizing +block sizes from 2k to 8k, with hash-based directories. This +filesystem driver may be used to read and write disks from these +devices. + +Note, it is not recommended that this FS be used in place of a general +filesystem for your own streaming media device. Native Linux filesystems +will likely perform better. + +More information is available at: + + http://linux-karma.sf.net/ + +Various utilities, including mkomfs and omfsck, are included with +omfsprogs, available at: + + http://bobcopeland.com/karma/ + +Instructions are included in its README. + +Options +======= + +OMFS supports the following mount-time options: + + uid=n - make all files owned by specified user + gid=n - make all files owned by specified group + umask=xxx - set permission umask to xxx + fmask=xxx - set umask to xxx for files + dmask=xxx - set umask to xxx for directories + +Disk format +=========== + +OMFS discriminates between "sysblocks" and normal data blocks. The sysblock +group consists of super block information, file metadata, directory structures, +and extents. Each sysblock has a header containing CRCs of the entire +sysblock, and may be mirrored in successive blocks on the disk. A sysblock may +have a smaller size than a data block, but since they are both addressed by the +same 64-bit block number, any remaining space in the smaller sysblock is +unused. + +Sysblock header information: + +struct omfs_header { + __be64 h_self; /* FS block where this is located */ + __be32 h_body_size; /* size of useful data after header */ + __be16 h_crc; /* crc-ccitt of body_size bytes */ + char h_fill1[2]; + u8 h_version; /* version, always 1 */ + char h_type; /* OMFS_INODE_X */ + u8 h_magic; /* OMFS_IMAGIC */ + u8 h_check_xor; /* XOR of header bytes before this */ + __be32 h_fill2; +}; + +Files and directories are both represented by omfs_inode: + +struct omfs_inode { + struct omfs_header i_head; /* header */ + __be64 i_parent; /* parent containing this inode */ + __be64 i_sibling; /* next inode in hash bucket */ + __be64 i_ctime; /* ctime, in milliseconds */ + char i_fill1[35]; + char i_type; /* OMFS_[DIR,FILE] */ + __be32 i_fill2; + char i_fill3[64]; + char i_name[OMFS_NAMELEN]; /* filename */ + __be64 i_size; /* size of file, in bytes */ +}; + +Directories in OMFS are implemented as a large hash table. Filenames are +hashed then prepended into the bucket list beginning at OMFS_DIR_START. +Lookup requires hashing the filename, then seeking across i_sibling pointers +until a match is found on i_name. Empty buckets are represented by block +pointers with all-1s (~0). + +A file is an omfs_inode structure followed by an extent table beginning at +OMFS_EXTENT_START: + +struct omfs_extent_entry { + __be64 e_cluster; /* start location of a set of blocks */ + __be64 e_blocks; /* number of blocks after e_cluster */ +}; + +struct omfs_extent { + __be64 e_next; /* next extent table location */ + __be32 e_extent_count; /* total # extents in this table */ + __be32 e_fill; + struct omfs_extent_entry e_entry; /* start of extent entries */ +}; + +Each extent holds the block offset followed by number of blocks allocated to +the extent. The final extent in each table is a terminator with e_cluster +being ~0 and e_blocks being ones'-complement of the total number of blocks +in the table. + +If this table overflows, a continuation inode is written and pointed to by +e_next. These have a header but lack the rest of the inode structure. + diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 7f268f327d750e725f1e5ca5ddfe99dcb8014cd5..64557821ee5984f5babe336f701f172b8602a4dc 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -296,6 +296,7 @@ Table 1-4: Kernel info in /proc uptime System uptime version Kernel version video bttv info of video resources (2.4) + vmallocinfo Show vmalloced areas .............................................................................. You can, for example, check which interrupts are currently in use and what @@ -557,6 +558,49 @@ VmallocTotal: total size of vmalloc memory area VmallocUsed: amount of vmalloc area which is used VmallocChunk: largest contigious block of vmalloc area which is free +.............................................................................. + +vmallocinfo: + +Provides information about vmalloced/vmaped areas. One line per area, +containing the virtual address range of the area, size in bytes, +caller information of the creator, and optional information depending +on the kind of area : + + pages=nr number of pages + phys=addr if a physical address was specified + ioremap I/O mapping (ioremap() and friends) + vmalloc vmalloc() area + vmap vmap()ed pages + user VM_USERMAP area + vpages buffer for pages pointers was vmalloced (huge area) + N=nr (Only on NUMA kernels) + Number of pages allocated on memory node + +> cat /proc/vmallocinfo +0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... + /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 +0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... + /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 +0xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f... + phys=7fee8000 ioremap +0xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f... + phys=7fee7000 ioremap +0xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210 +0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ... + /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 +0xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ... + pages=2 vmalloc N1=2 +0xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ... + /0x130 [x_tables] pages=4 vmalloc N0=4 +0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ... + pages=14 vmalloc N2=14 +0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ... + pages=4 vmalloc N1=4 +0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ... + pages=2 vmalloc N1=2 +0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ... + pages=10 vmalloc N0=10 1.3 IDE devices in /proc/ide ---------------------------- @@ -887,7 +931,7 @@ group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req stats stream_req mb_groups: -This file gives the details of mutiblock allocator buddy cache of free blocks +This file gives the details of multiblock allocator buddy cache of free blocks mb_history: Multiblock allocation history. @@ -1430,7 +1474,7 @@ used because pages_free(1355) is smaller than watermark + protection[2] normal page requirement. If requirement is DMA zone(index=0), protection[0] (=0) is used. -zone[i]'s protection[j] is calculated by following exprssion. +zone[i]'s protection[j] is calculated by following expression. (i < j): zone[i]->protection[j] diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt index 094f2d2f38b1a5571283ee745aa494c12b12808c..510b722667ac885cf7abed3732961941a5ad1c14 100644 --- a/Documentation/filesystems/relay.txt +++ b/Documentation/filesystems/relay.txt @@ -294,6 +294,16 @@ user-defined data with a channel, and is immediately available (including in create_buf_file()) via chan->private_data or buf->chan->private_data. +Buffer-only channels +-------------------- + +These channels have no files associated and can be created with +relay_open(NULL, NULL, ...). Such channels are useful in scenarios such +as when doing early tracing in the kernel, before the VFS is up. In these +cases, one may open a buffer-only channel and then call +relay_late_setup_files() when the kernel is ready to handle files, +to expose the buffered data to the userspace. + Channel 'modes' --------------- diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 7f27b8f840d06210df1511db6ef603cedb8f0e58..9e9c348275a96fd362acf9916d2789be8a08c4c8 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -248,6 +248,7 @@ The top level sysfs directory looks like: block/ bus/ class/ +dev/ devices/ firmware/ net/ @@ -274,6 +275,11 @@ fs/ contains a directory for some filesystems. Currently each filesystem wanting to export attributes must create its own hierarchy below fs/ (see ./fuse.txt for an example). +dev/ contains two directories char/ and block/. Inside these two +directories there are symlinks named :. These symlinks +point to the sysfs directory for the given device. /sys/dev provides a +quick way to lookup the sysfs interface for a device from the result of +a stat(2) operation. More information can driver-model specific features can be found in Documentation/driver-model/. diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt new file mode 100644 index 0000000000000000000000000000000000000000..540e9e7f59c53f083f105432a7a7be28f53eb2d3 --- /dev/null +++ b/Documentation/filesystems/ubifs.txt @@ -0,0 +1,164 @@ +Introduction +============= + +UBIFS file-system stands for UBI File System. UBI stands for "Unsorted +Block Images". UBIFS is a flash file system, which means it is designed +to work with flash devices. It is important to understand, that UBIFS +is completely different to any traditional file-system in Linux, like +Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems +which work with MTD devices, not block devices. The other Linux +file-system of this class is JFFS2. + +To make it more clear, here is a small comparison of MTD devices and +block devices. + +1 MTD devices represent flash devices and they consist of eraseblocks of + rather large size, typically about 128KiB. Block devices consist of + small blocks, typically 512 bytes. +2 MTD devices support 3 main operations - read from some offset within an + eraseblock, write to some offset within an eraseblock, and erase a whole + eraseblock. Block devices support 2 main operations - read a whole + block and write a whole block. +3 The whole eraseblock has to be erased before it becomes possible to + re-write its contents. Blocks may be just re-written. +4 Eraseblocks become worn out after some number of erase cycles - + typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC + NAND flashes. Blocks do not have the wear-out property. +5 Eraseblocks may become bad (only on NAND flashes) and software should + deal with this. Blocks on hard drives typically do not become bad, + because hardware has mechanisms to substitute bad blocks, at least in + modern LBA disks. + +It should be quite obvious why UBIFS is very different to traditional +file-systems. + +UBIFS works on top of UBI. UBI is a separate software layer which may be +found in drivers/mtd/ubi. UBI is basically a volume management and +wear-leveling layer. It provides so called UBI volumes which is a higher +level abstraction than a MTD device. The programming model of UBI devices +is very similar to MTD devices - they still consist of large eraseblocks, +they have read/write/erase operations, but UBI devices are devoid of +limitations like wear and bad blocks (items 4 and 5 in the above list). + +In a sense, UBIFS is a next generation of JFFS2 file-system, but it is +very different and incompatible to JFFS2. The following are the main +differences. + +* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on + top of UBI volumes. +* JFFS2 does not have on-media index and has to build it while mounting, + which requires full media scan. UBIFS maintains the FS indexing + information on the flash media and does not require full media scan, + so it mounts many times faster than JFFS2. +* JFFS2 is a write-through file-system, while UBIFS supports write-back, + which makes UBIFS much faster on writes. + +Similarly to JFFS2, UBIFS supports on-the-flight compression which makes +it possible to fit quite a lot of data to the flash. + +Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts. +It does not need stuff like ckfs.ext2. UBIFS automatically replays its +journal and recovers from crashes, ensuring that the on-flash data +structures are consistent. + +UBIFS scales logarithmically (most of the data structures it uses are +trees), so the mount time and memory consumption do not linearly depend +on the flash size, like in case of JFFS2. This is because UBIFS +maintains the FS index on the flash media. However, UBIFS depends on +UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly. +Nevertheless, UBI/UBIFS scales considerably better than JFFS2. + +The authors of UBIFS believe, that it is possible to develop UBI2 which +would scale logarithmically as well. UBI2 would support the same API as UBI, +but it would be binary incompatible to UBI. So UBIFS would not need to be +changed to use UBI2 + + +Mount options +============= + +(*) == default. + +norm_unmount (*) commit on unmount; the journal is committed + when the file-system is unmounted so that the + next mount does not have to replay the journal + and it becomes very fast; +fast_unmount do not commit on unmount; this option makes + unmount faster, but the next mount slower + because of the need to replay the journal. + + +Quick usage instructions +======================== + +The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax, +where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is +UBI volume name. + +Mount volume 0 on UBI device 0 to /mnt/ubifs: +$ mount -t ubifs ubi0_0 /mnt/ubifs + +Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume +name): +$ mount -t ubifs ubi0:rootfs /mnt/ubifs + +The following is an example of the kernel boot arguments to attach mtd0 +to UBI and mount volume "rootfs": +ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs + + +Module Parameters for Debugging +=============================== + +When UBIFS has been compiled with debugging enabled, there are 3 module +parameters that are available to control aspects of testing and debugging. +The parameters are unsigned integers where each bit controls an option. +The parameters are: + +debug_msgs Selects which debug messages to display, as follows: + + Message Type Flag value + + General messages 1 + Journal messages 2 + Mount messages 4 + Commit messages 8 + LEB search messages 16 + Budgeting messages 32 + Garbage collection messages 64 + Tree Node Cache (TNC) messages 128 + LEB properties (lprops) messages 256 + Input/output messages 512 + Log messages 1024 + Scan messages 2048 + Recovery messages 4096 + +debug_chks Selects extra checks that UBIFS can do while running: + + Check Flag value + + General checks 1 + Check Tree Node Cache (TNC) 2 + Check indexing tree size 4 + Check orphan area 8 + Check old indexing tree 16 + Check LEB properties (lprops) 32 + Check leaf nodes and inodes 64 + +debug_tsts Selects a mode of testing, as follows: + + Test mode Flag value + + Force in-the-gaps method 2 + Failure mode for recovery testing 4 + +For example, set debug_msgs to 5 to display General messages and Mount +messages. + + +References +========== + +UBIFS documentation and FAQ/HOWTO at the MTD web site: +http://www.linux-mtd.infradead.org/doc/ubifs.html +http://www.linux-mtd.infradead.org/faq/ubifs.html diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index 2d5e1e582e13272bfaef2fbc494a156d829cc289..bbac4f1d90567c3f7ea0f46a869c091160390076 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -96,6 +96,14 @@ shortname=lower|win95|winnt|mixed emulate the Windows 95 rule for create. Default setting is `lower'. +tz=UTC -- Interpret timestamps as UTC rather than local time. + This option disables the conversion of timestamps + between local time (as used by Windows on FAT) and UTC + (which Linux uses internally). This is particuluarly + useful when mounting devices (like digital cameras) + that are set to UTC in order to avoid the pitfalls of + local time. + : 0,1,yes,no,true,false TODO diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b7522c6cbae3758f77cb12e22b78825e43a3796d..c4d348dabe9499454055ebff41fb88ef71346dda 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -143,7 +143,7 @@ struct file_system_type { The get_sb() method has the following arguments: - struct file_system_type *fs_type: decribes the filesystem, partly initialized + struct file_system_type *fs_type: describes the filesystem, partly initialized by the specific filesystem code int flags: mount flags @@ -895,9 +895,9 @@ struct dentry_operations { iput() yourself d_dname: called when the pathname of a dentry should be generated. - Usefull for some pseudo filesystems (sockfs, pipefs, ...) to delay + Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay pathname generation. (Instead of doing it when dentry is created, - its done only when the path is needed.). Real filesystems probably + it's done only when the path is needed.). Real filesystems probably dont want to use it, because their dentries are present in global dcache hash, so their hash should be an invariant. As no lock is held, d_dname() should not try to modify the dentry itself, unless diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 77d3faa1a61116c5646a00d10d5c4832279aaede..d330fe3103da9c9a3cb8f888ac7255ce48e666d4 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt @@ -4,9 +4,11 @@ Copyright 2008 Red Hat Inc. Author: Steven Rostedt License: The GNU Free Documentation License, Version 1.2 -Reviewers: Elias Oltmanns and Randy Dunlap + (dual licensed under the GPL v2) +Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, + John Kacur, and David Teigland. -Writen for: 2.6.26-rc8 linux-2.6-tip.git tip/tracing/ftrace branch +Written for: 2.6.27-rc1 Introduction ------------ @@ -18,10 +20,11 @@ issues that take place outside of user-space. Although ftrace is the function tracer, it also includes an infrastructure that allows for other types of tracing. Some of the -tracers that are currently in ftrace is a tracer to trace +tracers that are currently in ftrace include a tracer to trace context switches, the time it takes for a high priority task to run after it was woken up, the time interrupts are disabled, and -more. +more (ftrace allows for tracer plugins, which means that the list of +tracers can always grow). The File System @@ -35,6 +38,8 @@ To mount the debugfs system: # mkdir /debug # mount -t debugfs nodev /debug +(Note: it is more common to mount at /sys/kernel/debug, but for simplicity + this document will use /debug) That's it! (assuming that you have ftrace configured into your kernel) @@ -50,20 +55,19 @@ of ftrace. Here is a list of some of the key files: available_tracers : This holds the different types of tracers that have been compiled into the kernel. The tracers - listed here can be configured by echoing in their - name into current_tracer. + listed here can be configured by echoing their name + into current_tracer. tracing_enabled : This sets or displays whether the current_tracer is activated and tracing or not. Echo 0 into this - file to disable the tracer or 1 (or non-zero) to - enable it. + file to disable the tracer or 1 to enable it. trace : This file holds the output of the trace in a human readable - format. + format (described below). latency_trace : This file shows the same trace but the information is organized more to display possible latencies - in the system. + in the system (described below). trace_pipe : The output is the same as the "trace" file but this file is meant to be streamed with live tracing. @@ -75,7 +79,7 @@ of ftrace. Here is a list of some of the key files: file, it is consumed, and will not be read again with a sequential read. The "trace" and "latency_trace" files are static, and if the - tracer isn't adding more data, they will display + tracer is not adding more data, they will display the same information every time they are read. iter_ctrl : This file lets the user control the amount of data @@ -92,10 +96,10 @@ of ftrace. Here is a list of some of the key files: trace_entries : This sets or displays the number of trace entries each CPU buffer can hold. The tracer buffers - are the same size for each CPU, so care must be - taken when modifying the trace_entries. The trace - buffers are allocated in pages (blocks of memory that - the kernel uses for allocation, usually 4 KB in size). + are the same size for each CPU. The displayed number + is the size of the CPU buffer and not total size. The + trace buffers are allocated in pages (blocks of memory + that the kernel uses for allocation, usually 4 KB in size). Since each entry is smaller than a page, if the last allocated page has room for more entries than were requested, the rest of the page is used to allocate @@ -112,20 +116,19 @@ of ftrace. Here is a list of some of the key files: on specified CPUS. The format is a hex string representing the CPUS. - set_ftrace_filter : When dynamic ftrace is configured in, the - code is dynamically modified to disable calling - of the function profiler (mcount). This lets - tracing be configured in with practically no overhead - in performance. This also has a side effect of - enabling or disabling specific functions to be - traced. Echoing in names of functions into this - file will limit the trace to only these functions. - - set_ftrace_notrace: This has the opposite effect that - set_ftrace_filter has. Any function that is added - here will not be traced. If a function exists - in both set_ftrace_filter and set_ftrace_notrace, - the function will _not_ be traced. + set_ftrace_filter : When dynamic ftrace is configured in (see the + section below "dynamic ftrace"), the code is dynamically + modified (code text rewrite) to disable calling of the + function profiler (mcount). This lets tracing be configured + in with practically no overhead in performance. This also + has a side effect of enabling or disabling specific functions + to be traced. Echoing names of functions into this file + will limit the trace to only those functions. + + set_ftrace_notrace: This has an effect opposite to that of + set_ftrace_filter. Any function that is added here will not + be traced. If a function exists in both set_ftrace_filter + and set_ftrace_notrace, the function will _not_ be traced. available_filter_functions : When a function is encountered the first time by the dynamic tracer, it is recorded and @@ -133,32 +136,31 @@ of ftrace. Here is a list of some of the key files: lists the functions that have been recorded by the dynamic tracer and these functions can be used to set the ftrace filter by the above - "set_ftrace_filter" file. + "set_ftrace_filter" file. (See the section "dynamic ftrace" + below for more details). The Tracers ----------- -Here are the list of current tracers that can be configured. +Here is the list of current tracers that may be configured. ftrace - function tracer that uses mcount to trace all functions. - It is possible to filter out which functions that are - to be traced when dynamic ftrace is configured in. sched_switch - traces the context switches between tasks. - irqsoff - traces the areas that disable interrupts and saves off + irqsoff - traces the areas that disable interrupts and saves the trace with the longest max latency. See tracing_max_latency. When a new max is recorded, it replaces the old trace. It is best to view this - trace with the latency_trace file. + trace via the latency_trace file. - preemptoff - Similar to irqsoff but traces and records the time - preemption is disabled. + preemptoff - Similar to irqsoff but traces and records the amount of + time for which preemption is disabled. preemptirqsoff - Similar to irqsoff and preemptoff, but traces and - records the largest time irqs and/or preemption is - disabled. + records the largest time for which irqs and/or preemption + is disabled. wakeup - Traces and records the max latency that it takes for the highest priority task to get scheduled after @@ -171,13 +173,13 @@ Here are the list of current tracers that can be configured. Examples of using the tracer ---------------------------- -Here are typical examples of using the tracers with only controlling -them with the debugfs interface (without using any user-land utilities). +Here are typical examples of using the tracers when controlling them only +with the debugfs interface (without using any user-land utilities). Output format: -------------- -Here's an example of the output format of the file "trace" +Here is an example of the output format of the file "trace" -------- # tracer: ftrace @@ -189,14 +191,15 @@ Here's an example of the output format of the file "trace" bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput -------- -A header is printed with the trace that is represented. In this case -the tracer is "ftrace". Then a header showing the format. Task name -"bash", the task PID "4251", the CPU that it was running on +A header is printed with the tracer name that is represented by the trace. +In this case the tracer is "ftrace". Then a header showing the format. Task +name "bash", the task PID "4251", the CPU that it was running on "01", the timestamp in . format, the function name that was traced "path_put" and the parent function that called this function -"path_walk". +"path_walk". The timestamp is the time at which the function was +entered. -The sched_switch tracer also includes tracing of task wake ups and +The sched_switch tracer also includes tracing of task wakeups and context switches. ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S @@ -206,7 +209,7 @@ context switches. kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R -Wake ups are represented by a "+" and the context switches show +Wake ups are represented by a "+" and the context switches are shown as "==>". The format is: Context switches: @@ -221,7 +224,7 @@ Wake ups are represented by a "+" and the context switches show :: + :: -The prio is the internal kernel priority, which is inverse to the +The prio is the internal kernel priority, which is the inverse of the priority that is usually displayed by user-space tools. Zero represents the highest priority (99). Prio 100 starts the "nice" priorities with 100 being equal to nice -20 and 139 being nice 19. The prio "140" is @@ -232,7 +235,7 @@ Latency trace format -------------------- For traces that display latency times, the latency_trace file gives -a bit more information to see why a latency happened. Here's a typical +somewhat more information to see why a latency happened. Here is a typical trace. # tracer: irqsoff @@ -260,21 +263,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 -0 0d.s1 98us : trace_hardirqs_on (do_softirq) -vim:ft=help - -This shows that the current tracer is "irqsoff" tracing the time -interrupts are disabled. It gives the trace version and the kernel -this was executed on (2.6.26-rc8). Then it displays the max latency -in microsecs (97 us). The number of trace entries displayed -by the total number recorded (both are three: #3/3). The type of +This shows that the current tracer is "irqsoff" tracing the time for which +interrupts were disabled. It gives the trace version and the version +of the kernel upon which this was executed on (2.6.26-rc8). Then it displays +the max latency in microsecs (97 us). The number of trace entries displayed +and the total number recorded (both are three: #3/3). The type of preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero -and reserved for later use. #P is the number of online CPUS (#P:2). +and are reserved for later use. #P is the number of online CPUS (#P:2). -The task is the process that was running when the latency happened. +The task is the process that was running when the latency occurred. (swapper pid: 0). -The start and stop that caused the latencies: +The start and stop (the functions in which the interrupts were disabled and +enabled respectively) that caused the latencies: apic_timer_interrupt is where the interrupts were disabled. do_softirq is where they were enabled again. @@ -286,14 +288,14 @@ explains which is which. pid: The PID of that process. - CPU#: The CPU that the process was running on. + CPU#: The CPU which the process was running on. irqs-off: 'd' interrupts are disabled. '.' otherwise. need-resched: 'N' task need_resched is set, '.' otherwise. hardirq/softirq: - 'H' - hard irq happened inside a softirq. + 'H' - hard irq occurred inside a softirq. 'h' - hard irq is running 's' - soft irq is running '.' - normal context. @@ -303,7 +305,7 @@ explains which is which. The above is mostly meaningful for kernel developers. time: This differs from the trace file output. The trace file output - included an absolute timestamp. The timestamp used by the + includes an absolute timestamp. The timestamp used by the latency_trace file is relative to the start of the trace. delay: This is just to help catch your eye a bit better. And @@ -385,7 +387,7 @@ Here are the available options: sched_switch ------------ -This tracer simply records schedule switches. Here's an example +This tracer simply records schedule switches. Here is an example of how to use it. # echo sched_switch > /debug/tracing/current_tracer @@ -421,8 +423,8 @@ the name of the trace and points to the options. The "FUNCTION" is a misnomer since here it represents the wake ups and context switches. -The sched_switch only lists the wake ups (represented with '+') -and context switches ('==>') with the previous task or current +The sched_switch file only lists the wake ups (represented with '+') +and context switches ('==>') with the previous task or current task first followed by the next task or task waking up. The format for both of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO is the inverse of the actual priority with zero (0) being the highest @@ -437,7 +439,8 @@ The task states are: R - running : wants to run, may not actually be running S - sleep : process is waiting to be woken up (handles signals) - D - deep sleep : process must be woken up (ignores signals) + D - disk sleep (uninterruptible sleep) : process must be woken up + (ignores signals) T - stopped : process suspended t - traced : process is being traced (with something like gdb) Z - zombie : process waiting to be cleaned up @@ -447,8 +450,8 @@ The task states are: ftrace_enabled -------------- -The following tracers give different output depending on whether -or not the sysctl ftrace_enabled is set. To set ftrace_enabled, +The following tracers (listed below) give different output depending +on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, one can either use the sysctl function or set it via the proc file system interface. @@ -475,13 +478,12 @@ interrupt from triggering or the mouse interrupt from letting the kernel know of a new mouse event. The result is a latency with the reaction time. -The irqsoff tracer tracks the time interrupts are disabled to the time -they are re-enabled. When a new maximum latency is hit, it saves off -the trace so that it may be retrieved at a later time. Every time a -new maximum in reached, the old saved trace is discarded and the new -trace is saved. +The irqsoff tracer tracks the time for which interrupts are disabled. +When a new maximum latency is hit, the tracer saves the trace leading up +to that latency point so that every time a new maximum is reached, the old +saved trace is discarded and the new trace is saved. -To reset the maximum, echo 0 into tracing_max_latency. Here's an +To reset the maximum, echo 0 into tracing_max_latency. Here is an example: # echo irqsoff > /debug/tracing/current_tracer @@ -493,14 +495,14 @@ example: # cat /debug/tracing/latency_trace # tracer: irqsoff # -irqsoff latency trace v1.1.5 on 2.6.26-rc8 +irqsoff latency trace v1.1.5 on 2.6.26 -------------------------------------------------------------------- - latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) + latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- - | task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0) + | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- - => started at: copy_page_range - => ended at: copy_page_range + => started at: sys_setpgid + => ended at: sys_setpgid # _------=> CPU# # / _-----=> irqs-off @@ -511,21 +513,19 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / - bash-4269 1...1 0us+: _spin_lock (copy_page_range) - bash-4269 1...1 7us : _spin_unlock (copy_page_range) - bash-4269 1...2 7us : trace_preempt_on (copy_page_range) - + bash-3730 1d... 0us : _write_lock_irq (sys_setpgid) + bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid) + bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid) -vim:ft=help -Here we see that that we had a latency of 6 microsecs (which is -very good). The spin_lock in copy_page_range disabled interrupts. -The difference between the 6 and the displayed timestamp 7us is -because the clock must have incremented between the time of recording -the max latency and recording the function that had that latency. +Here we see that that we had a latency of 12 microsecs (which is +very good). The _write_lock_irq in sys_setpgid disabled interrupts. +The difference between the 12 and the displayed timestamp 14us occurred +because the clock was incremented between the time of recording the max +latency and the time of recording the function that had that latency. -Note the above had ftrace_enabled not set. If we set the ftrace_enabled, -we get a much larger output: +Note the above example had ftrace_enabled not set. If we set the +ftrace_enabled, we get a much larger output: # tracer: irqsoff # @@ -571,12 +571,10 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal) -vim:ft=help - Here we traced a 50 microsecond latency. But we also see all the functions that were called during that time. Note that by enabling -function tracing, we endure an added overhead. This overhead may +function tracing, we incur an added overhead. This overhead may extend the latency times. But nevertheless, this trace has provided some very helpful debugging information. @@ -590,8 +588,9 @@ for preemption to be enabled again before it can preempt a lower priority task. The preemptoff tracer traces the places that disable preemption. -Like the irqsoff, it records the maximum latency that preemption -was disabled. The control of preemptoff is much like the irqsoff. +Like the irqsoff tracer, it records the maximum latency for which preemption +was disabled. The control of preemptoff tracer is much like the irqsoff +tracer. # echo preemptoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -625,8 +624,6 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) -vim:ft=help - This has some more changes. Preemption was disabled when an interrupt came in (notice the 'h'), and was enabled while doing a softirq. (notice the 's'). But we also see that interrupts have been disabled @@ -694,16 +691,16 @@ The above is an example of the preemptoff trace with ftrace_enabled set. Here we see that interrupts were disabled the entire time. The irq_enter code lets us know that we entered an interrupt 'h'. Before that, the functions being traced still show that it is not -in an interrupt, but we can see by the functions themselves that +in an interrupt, but we can see from the functions themselves that this is not the case. -Notice that the __do_softirq when called doesn't have a preempt_count. -It may seem that we missed a preempt enabled. What really happened -is that the preempt count is held on the threads stack and we +Notice that __do_softirq when called does not have a preempt_count. +It may seem that we missed a preempt enabling. What really happened +is that the preempt count is held on the thread's stack and we switched to the softirq stack (4K stacks in effect). The code does not copy the preempt count, but because interrupts are disabled, -we don't need to worry about it. Having a tracer like this is good -to let people know what really happens inside the kernel. +we do not need to worry about it. Having a tracer like this is good +for letting people know what really happens inside the kernel. preemptirqsoff @@ -713,7 +710,7 @@ Knowing the locations that have interrupts disabled or preemption disabled for the longest times is helpful. But sometimes we would like to know when either preemption and/or interrupts are disabled. -The following code: +Consider the following code: local_irq_disable(); call_function_with_irqs_off(); @@ -769,12 +766,10 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq) -vim:ft=help - The trace_hardirqs_off_thunk is called from assembly on x86 when interrupts are disabled in the assembly code. Without the function -tracing, we don't know if interrupts were enabled within the preemption +tracing, we do not know if interrupts were enabled within the preemption points. We do see that it started with preemption enabled. Here is a trace with ftrace_enabled set: @@ -865,19 +860,19 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 This is a very interesting trace. It started with the preemption of the ls task. We see that the task had the "need_resched" bit set -with the 'N' in the trace. Interrupts are disabled in the spin_lock -and the trace started. We see that a schedule took place to run +via the 'N' in the trace. Interrupts were disabled before the spin_lock +at the beginning of the trace. We see that a schedule took place to run sshd. When the interrupts were enabled, we took an interrupt. On return from the interrupt handler, the softirq ran. We took another -interrupt while running the softirq as we see with the capital 'H'. +interrupt while running the softirq as we see from the capital 'H'. wakeup ------ -In Real-Time environment it is very important to know the wakeup -time it takes for the highest priority task that wakes up to the -time it executes. This is also known as "schedule latency". +In a Real-Time environment it is very important to know the wakeup +time it takes for the highest priority task that is woken up to the +time that it executes. This is also known as "schedule latency". I stress the point that this is about RT tasks. It is also important to know the scheduling latency of non-RT tasks, but the average schedule latency is better for non-RT tasks. Tools like @@ -926,8 +921,6 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8 -0 1d..4 4us : schedule (cpu_idle) -vim:ft=help - Running this on an idle system, we see that it only took 4 microseconds to perform the task switch. Note, since the trace marker in the @@ -996,15 +989,15 @@ ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) ksoftirq-7 1d..4 50us : schedule (__cond_resched) The interrupt went off while running ksoftirqd. This task runs at -SCHED_OTHER. Why didn't we see the 'N' set early? This may be +SCHED_OTHER. Why did not we see the 'N' set early? This may be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks -configured, the interrupt and softirq runs with their own stack. +configured, the interrupt and softirq run with their own stack. Some information is held on the top of the task's stack (need_resched and preempt_count are both stored there). The setting of the NEED_RESCHED bit is done directly to the task's stack, but the reading of the NEED_RESCHED is done by looking at the current stack, which in this case is the stack for the hard interrupt. This hides the fact that NEED_RESCHED -has been set. We don't see the 'N' until we switch back to the task's +has been set. We do not see the 'N' until we switch back to the task's assigned stack. ftrace @@ -1044,14 +1037,14 @@ this tracer is a nop. [...] -Note: It is sometimes better to enable or disable tracing directly from -a program, because the buffer may be overflowed by the echo commands -before you get to the point you want to trace. It is also easier to -stop the tracing at the point that you hit the part that you are -interested in. Since the ftrace buffer is a ring buffer with the -oldest data being overwritten, usually it is sufficient to start the -tracer with an echo command but have you code stop it. Something -like the following is usually appropriate for this. +Note: ftrace uses ring buffers to store the above entries. The newest data +may overwrite the oldest data. Sometimes using echo to stop the trace +is not sufficient because the tracing could have overwritten the data +that you wanted to record. For this reason, it is sometimes better to +disable tracing directly from a program. This allows you to stop the +tracing at the point that you hit the part that you are interested in. +To disable the tracing directly from a C program, something like following +code snippet can be used: int trace_fd; [...] @@ -1060,20 +1053,26 @@ int main(int argc, char *argv[]) { trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY); [...] if (condition_hit()) { - write(trace_fd, "0", 1); + write(trace_fd, "0", 1); } [...] } +Note: Here we hard coded the path name. The debugfs mount is not +guaranteed to be at /debug (and is more commonly at /sys/kernel/debug). +For simple one time traces, the above is sufficent. For anything else, +a search through /proc/mounts may be needed to find where the debugfs +file-system is mounted. dynamic ftrace -------------- -If CONFIG_DYNAMIC_FTRACE is set, then the system will run with +If CONFIG_DYNAMIC_FTRACE is set, the system will run with virtually no overhead when function tracing is disabled. The way this works is the mcount function call (placed at the start of every kernel function, produced by the -pg switch in gcc), starts -of pointing to a simple return. +of pointing to a simple return. (Enabling FTRACE will include the +-pg switch in the compiling of the kernel.) When dynamic ftrace is initialized, it calls kstop_machine to make the machine act like a uniprocessor so that it can freely modify code @@ -1086,15 +1085,15 @@ Later on the ftraced kernel thread is awoken and will again call kstop_machine if new functions have been recorded. The ftraced thread will change all calls to mcount to "nop". Just calling mcount and having mcount return has shown a 10% overhead. By converting -it to a nop, there is no recordable overhead to the system. +it to a nop, there is no measurable overhead to the system. One special side-effect to the recording of the functions being -traced, is that we can now selectively choose which functions we -want to trace and which ones we want the mcount calls to remain as +traced is that we can now selectively choose which functions we +wish to trace and which ones we want the mcount calls to remain as nops. Two files are used, one for enabling and one for disabling the tracing -of recorded functions. They are: +of specified functions. They are: set_ftrace_filter @@ -1116,7 +1115,7 @@ pick_next_task_fair mutex_lock [...] -If I'm only interested in sys_nanosleep and hrtimer_interrupt: +If I am only interested in sys_nanosleep and hrtimer_interrupt: # echo sys_nanosleep hrtimer_interrupt \ > /debug/tracing/set_ftrace_filter @@ -1133,21 +1132,21 @@ If I'm only interested in sys_nanosleep and hrtimer_interrupt: usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call -0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt -To see what functions are being traced, you can cat the file: +To see which functions are being traced, you can cat the file: # cat /debug/tracing/set_ftrace_filter hrtimer_interrupt sys_nanosleep -Perhaps this isn't enough. The filters also allow simple wild cards. +Perhaps this is not enough. The filters also allow simple wild cards. Only the following are currently available * - will match functions that begin with * - will match functions that end with ** - will match functions that have in it -Thats all the wild cards that are allowed. +These are the only wild cards which are supported. * will not work. @@ -1258,15 +1257,15 @@ calls that need to be converted into nops. If there are not any, then it simply goes back to sleep. But if there are some, it will call kstop_machine to convert the calls to nops. -There may be a case that you do not want this added latency. +There may be a case in which you do not want this added latency. Perhaps you are doing some audio recording and this activity might cause skips in the playback. There is an interface to disable -and enable the ftraced kernel thread. +and enable the "ftraced" kernel thread. # echo 0 > /debug/tracing/ftraced_enabled -This will disable the calling of the kstop_machine to update the -mcount calls to nops. Remember that there's a large overhead +This will disable the calling of kstop_machine to update the +mcount calls to nops. Remember that there is a large overhead to calling mcount. Without this kernel thread, that overhead will exist. @@ -1282,8 +1281,8 @@ that uses ftrace function recording. trace_pipe ---------- -The trace_pipe outputs the same as trace, but the effect on the -tracing is different. Every read from trace_pipe is consumed. +The trace_pipe outputs the same content as the trace file, but the effect +on the tracing is different. Every read from trace_pipe is consumed. This means that subsequent reads will be different. The trace is live. @@ -1313,7 +1312,7 @@ is live. bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up -Note, reading the trace_pipe will block until more input is added. +Note, reading the trace_pipe file will block until more input is added. By changing the tracer, trace_pipe will issue an EOF. We needed to set the ftrace tracer _before_ cating the trace_pipe file. @@ -1322,7 +1321,7 @@ trace entries ------------- Having too much or not enough data can be troublesome in diagnosing -some issue in the kernel. The file trace_entries is used to modify +an issue in the kernel. The file trace_entries is used to modify the size of the internal trace buffers. The number listed is the number of entries that can be recorded per CPU. To know the full size, multiply the number of possible CPUS with the @@ -1332,7 +1331,8 @@ number of entries. 65620 Note, to modify this, you must have tracing completely disabled. To do that, -echo "none" into the current_tracer. +echo "none" into the current_tracer. If the current_tracer is not set +to "none", an EINVAL error will be returned. # echo none > /debug/tracing/current_tracer # echo 100000 > /debug/tracing/trace_entries @@ -1341,18 +1341,18 @@ echo "none" into the current_tracer. Notice that we echoed in 100,000 but the size is 100,045. The entries -are held by individual pages. It allocates the number of pages it takes +are held in individual pages. It allocates the number of pages it takes to fulfill the request. If more entries may fit on the last page -it will add them. +then they will be added. # echo 1 > /debug/tracing/trace_entries # cat /debug/tracing/trace_entries 85 -This shows us that 85 entries can fit on a single page. +This shows us that 85 entries can fit in a single page. -The number of pages that will be allocated is a percentage of available -memory. Allocating too much will produce an error. +The number of pages which will be allocated is limited to a percentage +of available memory. Allocating too much will produce an error. # echo 1000000000000 > /debug/tracing/trace_entries -bash: echo: write error: Cannot allocate memory diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index c35ca9e40d4ca8ae0cbf7c06c639d8531ae8d9dd..18022e249c53dc1ad991d74d160551fa32f32070 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -347,15 +347,12 @@ necessarily be nonportable. Dynamic definition of GPIOs is not currently standard; for example, as a side effect of configuring an add-on board with some GPIO expanders. -These calls are purely for kernel space, but a userspace API could be built -on top of them. - GPIO implementor's framework (OPTIONAL) ======================================= As noted earlier, there is an optional implementation framework making it easier for platforms to support different kinds of GPIO controller using -the same programming interface. +the same programming interface. This framework is called "gpiolib". As a debugging aid, if debugfs is available a /sys/kernel/debug/gpio file will be found there. That will list all the controllers registered through @@ -392,11 +389,21 @@ either NULL or the label associated with that GPIO when it was requested. Platform Support ---------------- -To support this framework, a platform's Kconfig will "select HAVE_GPIO_LIB" +To support this framework, a platform's Kconfig will "select" either +ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB and arrange that its includes and defines three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep(). They may also want to provide a custom value for ARCH_NR_GPIOS. +ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled +into the kernel on that architecture. + +ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user +can enable it and build it into the kernel optionally. + +If neither of these options are selected, the platform does not support +GPIOs through GPIO-lib and the code cannot be enabled by the user. + Trivial implementations of those functions can directly use framework code, which always dispatches through the gpio_chip: @@ -439,4 +446,120 @@ becomes available. That may mean the device should not be registered until calls for that GPIO can work. One way to address such dependencies is for such gpio_chip controllers to provide setup() and teardown() callbacks to board specific code; those board specific callbacks would register devices -once all the necessary resources are available. +once all the necessary resources are available, and remove them later when +the GPIO controller device becomes unavailable. + + +Sysfs Interface for Userspace (OPTIONAL) +======================================== +Platforms which use the "gpiolib" implementors framework may choose to +configure a sysfs user interface to GPIOs. This is different from the +debugfs interface, since it provides control over GPIO direction and +value instead of just showing a gpio state summary. Plus, it could be +present on production systems without debugging support. + +Given approprate hardware documentation for the system, userspace could +know for example that GPIO #23 controls the write protect line used to +protect boot loader segments in flash memory. System upgrade procedures +may need to temporarily remove that protection, first importing a GPIO, +then changing its output state, then updating the code before re-enabling +the write protection. In normal use, GPIO #23 would never be touched, +and the kernel would have no need to know about it. + +Again depending on appropriate hardware documentation, on some systems +userspace GPIO can be used to determine system configuration data that +standard kernels won't know about. And for some tasks, simple userspace +GPIO drivers could be all that the system really needs. + +Note that standard kernel drivers exist for common "LEDs and Buttons" +GPIO tasks: "leds-gpio" and "gpio_keys", respectively. Use those +instead of talking directly to the GPIOs; they integrate with kernel +frameworks better than your userspace code could. + + +Paths in Sysfs +-------------- +There are three kinds of entry in /sys/class/gpio: + + - Control interfaces used to get userspace control over GPIOs; + + - GPIOs themselves; and + + - GPIO controllers ("gpio_chip" instances). + +That's in addition to standard files including the "device" symlink. + +The control interfaces are write-only: + + /sys/class/gpio/ + + "export" ... Userspace may ask the kernel to export control of + a GPIO to userspace by writing its number to this file. + + Example: "echo 19 > export" will create a "gpio19" node + for GPIO #19, if that's not requested by kernel code. + + "unexport" ... Reverses the effect of exporting to userspace. + + Example: "echo 19 > unexport" will remove a "gpio19" + node exported using the "export" file. + +GPIO signals have paths like /sys/class/gpio/gpio42/ (for GPIO #42) +and have the following read/write attributes: + + /sys/class/gpio/gpioN/ + + "direction" ... reads as either "in" or "out". This value may + normally be written. Writing as "out" defaults to + initializing the value as low. To ensure glitch free + operation, values "low" and "high" may be written to + configure the GPIO as an output with that initial value. + + Note that this attribute *will not exist* if the kernel + doesn't support changing the direction of a GPIO, or + it was exported by kernel code that didn't explicitly + allow userspace to reconfigure this GPIO's direction. + + "value" ... reads as either 0 (low) or 1 (high). If the GPIO + is configured as an output, this value may be written; + any nonzero value is treated as high. + +GPIO controllers have paths like /sys/class/gpio/chipchip42/ (for the +controller implementing GPIOs starting at #42) and have the following +read-only attributes: + + /sys/class/gpio/gpiochipN/ + + "base" ... same as N, the first GPIO managed by this chip + + "label" ... provided for diagnostics (not always unique) + + "ngpio" ... how many GPIOs this manges (N to N + ngpio - 1) + +Board documentation should in most cases cover what GPIOs are used for +what purposes. However, those numbers are not always stable; GPIOs on +a daughtercard might be different depending on the base board being used, +or other cards in the stack. In such cases, you may need to use the +gpiochip nodes (possibly in conjunction with schematics) to determine +the correct GPIO number to use for a given signal. + + +Exporting from Kernel code +-------------------------- +Kernel code can explicitly manage exports of GPIOs which have already been +requested using gpio_request(): + + /* export the GPIO to userspace */ + int gpio_export(unsigned gpio, bool direction_may_change); + + /* reverse gpio_export() */ + void gpio_unexport(); + +After a kernel driver requests a GPIO, it may only be made available in +the sysfs interface by gpio_export(). The driver can control whether the +signal direction may change. This helps drivers prevent userspace code +from accidentally clobbering important system state. + +This explicit exporting can help with debugging (by making some kinds +of experiments easier), or can provide an always-there interface that's +suitable for documenting as part of a board support package. diff --git a/Documentation/hwmon/dme1737 b/Documentation/hwmon/dme1737 index 8f446070e64a56ebc2c19a98c3f5e40688648b06..b1fe009994396030943ad570ddb84abe42611de5 100644 --- a/Documentation/hwmon/dme1737 +++ b/Documentation/hwmon/dme1737 @@ -22,6 +22,10 @@ Module Parameters and PWM output control functions. Using this parameter shouldn't be required since the BIOS usually takes care of this. +* probe_all_addr: bool Include non-standard LPC addresses 0x162e and 0x164e + when probing for ISA devices. This is required for the + following boards: + - VIA EPIA SN18000 Note that there is no need to use this parameter if the driver loads without complaining. The driver will say so if it is necessary. diff --git a/Documentation/hwmon/lm85 b/Documentation/hwmon/lm85 index 9549237530cf8a08dbe1460f3f048802cea62567..6d41db7f17f8b5372a765504f2d6c173bfc2b372 100644 --- a/Documentation/hwmon/lm85 +++ b/Documentation/hwmon/lm85 @@ -96,11 +96,6 @@ initial testing of the ADM1027 it was 1.00 degC steps. Analog Devices has confirmed this "bug". The ADT7463 is reported to work as described in the documentation. The current lm85 driver does not show the offset register. -The ADT7463 has a THERM asserted counter. This counter has a 22.76ms -resolution and a range of 5.8 seconds. The driver implements a 32-bit -accumulator of the counter value to extend the range to over a year. The -counter will stay at it's max value until read. - See the vendor datasheets for more information. There is application note from National (AN-1260) with some additional information about the LM85. The Analog Devices datasheet is very detailed and describes a procedure for @@ -206,13 +201,15 @@ Configuration choices: The National LM85's have two vendor specific configuration features. Tach. mode and Spinup Control. For more details on these, -see the LM85 datasheet or Application Note AN-1260. +see the LM85 datasheet or Application Note AN-1260. These features +are not currently supported by the lm85 driver. The Analog Devices ADM1027 has several vendor specific enhancements. The number of pulses-per-rev of the fans can be set, Tach monitoring can be optimized for PWM operation, and an offset can be applied to the temperatures to compensate for systemic errors in the -measurements. +measurements. These features are not currently supported by the lm85 +driver. In addition to the ADM1027 features, the ADT7463 also has Tmin control and THERM asserted counts. Automatic Tmin control acts to adjust the diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index a0cd8af2f40806a53ac2dc1841be8530d4e59992..10ca43cd1a72aa9d2d5bafc77a1116c126d7d892 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 @@ -49,7 +49,7 @@ $ modprobe max6875 force=0,0x50 The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple addresses. For example, for address 0x50, it also reserves 0x51. -The even-address instance is called 'max6875', the odd one is 'max6875 subclient'. +The even-address instance is called 'max6875', the odd one is 'dummy'. Programming the chip using i2c-dev diff --git a/Documentation/i2c/chips/pca9539 b/Documentation/i2c/chips/pca9539 index 1d81c530c4a584f15e25ad85e31e57693acfff6f..6aff890088b15d74d08119ecf97f8e250fac9605 100644 --- a/Documentation/i2c/chips/pca9539 +++ b/Documentation/i2c/chips/pca9539 @@ -7,7 +7,7 @@ drivers/gpio/pca9539.c instead. Supported chips: * Philips PCA9539 Prefix: 'pca9539' - Addresses scanned: 0x74 - 0x77 + Addresses scanned: none Datasheet: http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf @@ -23,6 +23,14 @@ The input sense can also be inverted. The 16 lines are split between two bytes. +Detection +--------- + +The PCA9539 is difficult to detect and not commonly found in PC machines, +so you have to pass the I2C bus and address of the installed PCA9539 +devices explicitly to the driver at load time via the force=... parameter. + + Sysfs entries ------------- diff --git a/Documentation/i2c/chips/pcf8574 b/Documentation/i2c/chips/pcf8574 index 5c1ad1376b62e5588d1b95b0f573211b39e70851..235815c075ff5c8172b4641f8c74283890fbc7b5 100644 --- a/Documentation/i2c/chips/pcf8574 +++ b/Documentation/i2c/chips/pcf8574 @@ -4,13 +4,13 @@ Kernel driver pcf8574 Supported chips: * Philips PCF8574 Prefix: 'pcf8574' - Addresses scanned: I2C 0x20 - 0x27 + Addresses scanned: none Datasheet: Publicly available at the Philips Semiconductors website http://www.semiconductors.philips.com/pip/PCF8574P.html * Philips PCF8574A Prefix: 'pcf8574a' - Addresses scanned: I2C 0x38 - 0x3f + Addresses scanned: none Datasheet: Publicly available at the Philips Semiconductors website http://www.semiconductors.philips.com/pip/PCF8574P.html @@ -38,12 +38,10 @@ For more informations see the datasheet. Accessing PCF8574(A) via /sys interface ------------------------------------- -! Be careful ! The PCF8574(A) is plainly impossible to detect ! Stupid chip. -So every chip with address in the interval [20..27] and [38..3f] are -detected as PCF8574(A). If you have other chips in this address -range, the workaround is to load this module after the one -for your others chips. +So, you have to pass the I2C bus and address of the installed PCF857A +and PCF8574A devices explicitly to the driver at load time via the +force=... parameter. On detection (i.e. insmod, modprobe et al.), directories are being created for each detected PCF8574(A): diff --git a/Documentation/i2c/chips/pcf8575 b/Documentation/i2c/chips/pcf8575 index 25f5698a61cf646ee5d78b24bcfdbc1b6e6154b1..40b268eb276fc3b514142f16958d6f660d79dc04 100644 --- a/Documentation/i2c/chips/pcf8575 +++ b/Documentation/i2c/chips/pcf8575 @@ -40,12 +40,9 @@ Detection --------- There is no method known to detect whether a chip on a given I2C address is -a PCF8575 or whether it is any other I2C device. So there are two alternatives -to let the driver find the installed PCF8575 devices: -- Load this driver after any other I2C driver for I2C devices with addresses - in the range 0x20 .. 0x27. -- Pass the I2C bus and address of the installed PCF8575 devices explicitly to - the driver at load time via the probe=... or force=... parameters. +a PCF8575 or whether it is any other I2C device, so you have to pass the I2C +bus and address of the installed PCF8575 devices explicitly to the driver at +load time via the force=... parameter. /sys interface -------------- diff --git a/Documentation/i2c/upgrading-clients b/Documentation/i2c/upgrading-clients new file mode 100644 index 0000000000000000000000000000000000000000..9a45f9bb6a255d421bd2903e69120c0a8e281017 --- /dev/null +++ b/Documentation/i2c/upgrading-clients @@ -0,0 +1,281 @@ +Upgrading I2C Drivers to the new 2.6 Driver Model +================================================= + +Ben Dooks + +Introduction +------------ + +This guide outlines how to alter existing Linux 2.6 client drivers from +the old to the new new binding methods. + + +Example old-style driver +------------------------ + + +struct example_state { + struct i2c_client client; + .... +}; + +static struct i2c_driver example_driver; + +static unsigned short ignore[] = { I2C_CLIENT_END }; +static unsigned short normal_addr[] = { OUR_ADDR, I2C_CLIENT_END }; + +I2C_CLIENT_INSMOD; + +static int example_attach(struct i2c_adapter *adap, int addr, int kind) +{ + struct example_state *state; + struct device *dev = &adap->dev; /* to use for dev_ reports */ + int ret; + + state = kzalloc(sizeof(struct example_state), GFP_KERNEL); + if (state == NULL) { + dev_err(dev, "failed to create our state\n"); + return -ENOMEM; + } + + example->client.addr = addr; + example->client.flags = 0; + example->client.adapter = adap; + + i2c_set_clientdata(&state->i2c_client, state); + strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE); + + ret = i2c_attach_client(&state->i2c_client); + if (ret < 0) { + dev_err(dev, "failed to attach client\n"); + kfree(state); + return ret; + } + + dev = &state->i2c_client.dev; + + /* rest of the initialisation goes here. */ + + dev_info(dev, "example client created\n"); + + return 0; +} + +static int __devexit example_detach(struct i2c_client *client) +{ + struct example_state *state = i2c_get_clientdata(client); + + i2c_detach_client(client); + kfree(state); + return 0; +} + +static int example_attach_adapter(struct i2c_adapter *adap) +{ + return i2c_probe(adap, &addr_data, example_attach); +} + +static struct i2c_driver example_driver = { + .driver = { + .owner = THIS_MODULE, + .name = "example", + }, + .attach_adapter = example_attach_adapter, + .detach_client = __devexit_p(example_detach), + .suspend = example_suspend, + .resume = example_resume, +}; + + +Updating the client +------------------- + +The new style binding model will check against a list of supported +devices and their associated address supplied by the code registering +the busses. This means that the driver .attach_adapter and +.detach_adapter methods can be removed, along with the addr_data, +as follows: + +- static struct i2c_driver example_driver; + +- static unsigned short ignore[] = { I2C_CLIENT_END }; +- static unsigned short normal_addr[] = { OUR_ADDR, I2C_CLIENT_END }; + +- I2C_CLIENT_INSMOD; + +- static int example_attach_adapter(struct i2c_adapter *adap) +- { +- return i2c_probe(adap, &addr_data, example_attach); +- } + + static struct i2c_driver example_driver = { +- .attach_adapter = example_attach_adapter, +- .detach_client = __devexit_p(example_detach), + } + +Add the probe and remove methods to the i2c_driver, as so: + + static struct i2c_driver example_driver = { ++ .probe = example_probe, ++ .remove = __devexit_p(example_remove), + } + +Change the example_attach method to accept the new parameters +which include the i2c_client that it will be working with: + +- static int example_attach(struct i2c_adapter *adap, int addr, int kind) ++ static int example_probe(struct i2c_client *client, ++ const struct i2c_device_id *id) + +Change the name of example_attach to example_probe to align it with the +i2c_driver entry names. The rest of the probe routine will now need to be +changed as the i2c_client has already been setup for use. + +The necessary client fields have already been setup before +the probe function is called, so the following client setup +can be removed: + +- example->client.addr = addr; +- example->client.flags = 0; +- example->client.adapter = adap; +- +- strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE); + +The i2c_set_clientdata is now: + +- i2c_set_clientdata(&state->client, state); ++ i2c_set_clientdata(client, state); + +The call to i2c_attach_client is no longer needed, if the probe +routine exits successfully, then the driver will be automatically +attached by the core. Change the probe routine as so: + +- ret = i2c_attach_client(&state->i2c_client); +- if (ret < 0) { +- dev_err(dev, "failed to attach client\n"); +- kfree(state); +- return ret; +- } + + +Remove the storage of 'struct i2c_client' from the 'struct example_state' +as we are provided with the i2c_client in our example_probe. Instead we +store a pointer to it for when it is needed. + +struct example_state { +- struct i2c_client client; ++ struct i2c_client *client; + +the new i2c client as so: + +- struct device *dev = &adap->dev; /* to use for dev_ reports */ ++ struct device *dev = &i2c_client->dev; /* to use for dev_ reports */ + +And remove the change after our client is attached, as the driver no +longer needs to register a new client structure with the core: + +- dev = &state->i2c_client.dev; + +In the probe routine, ensure that the new state has the client stored +in it: + +static int example_probe(struct i2c_client *i2c_client, + const struct i2c_device_id *id) +{ + struct example_state *state; + struct device *dev = &i2c_client->dev; + int ret; + + state = kzalloc(sizeof(struct example_state), GFP_KERNEL); + if (state == NULL) { + dev_err(dev, "failed to create our state\n"); + return -ENOMEM; + } + ++ state->client = i2c_client; + +Update the detach method, by changing the name to _remove and +to delete the i2c_detach_client call. It is possible that you +can also remove the ret variable as it is not not needed for +any of the core functions. + +- static int __devexit example_detach(struct i2c_client *client) ++ static int __devexit example_remove(struct i2c_client *client) +{ + struct example_state *state = i2c_get_clientdata(client); + +- i2c_detach_client(client); + +And finally ensure that we have the correct ID table for the i2c-core +and other utilities: + ++ struct i2c_device_id example_idtable[] = { ++ { "example", 0 }, ++ { } ++}; ++ ++MODULE_DEVICE_TABLE(i2c, example_idtable); + +static struct i2c_driver example_driver = { + .driver = { + .owner = THIS_MODULE, + .name = "example", + }, ++ .id_table = example_ids, + + +Our driver should now look like this: + +struct example_state { + struct i2c_client *client; + .... +}; + +static int example_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + struct example_state *state; + struct device *dev = &client->dev; + + state = kzalloc(sizeof(struct example_state), GFP_KERNEL); + if (state == NULL) { + dev_err(dev, "failed to create our state\n"); + return -ENOMEM; + } + + state->client = client; + i2c_set_clientdata(client, state); + + /* rest of the initialisation goes here. */ + + dev_info(dev, "example client created\n"); + + return 0; +} + +static int __devexit example_remove(struct i2c_client *client) +{ + struct example_state *state = i2c_get_clientdata(client); + + kfree(state); + return 0; +} + +static struct i2c_device_id example_idtable[] = { + { "example", 0 }, + { } +}; + +MODULE_DEVICE_TABLE(i2c, example_idtable); + +static struct i2c_driver example_driver = { + .driver = { + .owner = THIS_MODULE, + .name = "example", + }, + .id_table = example_idtable, + .probe = example_probe, + .remove = __devexit_p(example_remove), + .suspend = example_suspend, + .resume = example_resume, +}; diff --git a/Documentation/ia64/kvm.txt b/Documentation/ia64/kvm.txt index bec9d815da33a0495c7ea4d2eebecf7f4c1ab2fc..914d07f49268b728ab5c1d8a6e7b23b59cef0128 100644 --- a/Documentation/ia64/kvm.txt +++ b/Documentation/ia64/kvm.txt @@ -50,9 +50,9 @@ Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qe /usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image (xx is the number of virtual processors for the guest, now the maximum value is 4) -5. Known possibile issue on some platforms with old Firmware. +5. Known possible issue on some platforms with old Firmware. -If meet strange host crashe issues, try to solve it through either of the following ways: +In the event of strange host crash issues, try to solve it through either of the following ways: (1): Upgrade your Firmware to the latest one. @@ -65,8 +65,8 @@ index 0b53344..f02b0f7 100644 mov ar.pfs = loc1 mov rp = loc0 ;; -- srlz.d // seralize restoration of psr.l -+ srlz.i // seralize restoration of psr.l +- srlz.d // serialize restoration of psr.l ++ srlz.i // serialize restoration of psr.l + ;; br.ret.sptk.many b0 END(ia64_pal_call_static) diff --git a/Documentation/ia64/paravirt_ops.txt b/Documentation/ia64/paravirt_ops.txt new file mode 100644 index 0000000000000000000000000000000000000000..39ded02ec33fc8a06c3e09d34ad1760d9e417b08 --- /dev/null +++ b/Documentation/ia64/paravirt_ops.txt @@ -0,0 +1,137 @@ +Paravirt_ops on IA64 +==================== + 21 May 2008, Isaku Yamahata + + +Introduction +------------ +The aim of this documentation is to help with maintainability and/or to +encourage people to use paravirt_ops/IA64. + +paravirt_ops (pv_ops in short) is a way for virtualization support of +Linux kernel on x86. Several ways for virtualization support were +proposed, paravirt_ops is the winner. +On the other hand, now there are also several IA64 virtualization +technologies like kvm/IA64, xen/IA64 and many other academic IA64 +hypervisors so that it is good to add generic virtualization +infrastructure on Linux/IA64. + + +What is paravirt_ops? +--------------------- +It has been developed on x86 as virtualization support via API, not ABI. +It allows each hypervisor to override operations which are important for +hypervisors at API level. And it allows a single kernel binary to run on +all supported execution environments including native machine. +Essentially paravirt_ops is a set of function pointers which represent +operations corresponding to low level sensitive instructions and high +level functionalities in various area. But one significant difference +from usual function pointer table is that it allows optimization with +binary patch. It is because some of these operations are very +performance sensitive and indirect call overhead is not negligible. +With binary patch, indirect C function call can be transformed into +direct C function call or in-place execution to eliminate the overhead. + +Thus, operations of paravirt_ops are classified into three categories. +- simple indirect call + These operations correspond to high level functionality so that the + overhead of indirect call isn't very important. + +- indirect call which allows optimization with binary patch + Usually these operations correspond to low level instructions. They + are called frequently and performance critical. So the overhead is + very important. + +- a set of macros for hand written assembly code + Hand written assembly codes (.S files) also need paravirtualization + because they include sensitive instructions or some of code paths in + them are very performance critical. + + +The relation to the IA64 machine vector +--------------------------------------- +Linux/IA64 has the IA64 machine vector functionality which allows the +kernel to switch implementations (e.g. initialization, ipi, dma api...) +depending on executing platform. +We can replace some implementations very easily defining a new machine +vector. Thus another approach for virtualization support would be +enhancing the machine vector functionality. +But paravirt_ops approach was taken because +- virtualization support needs wider support than machine vector does. + e.g. low level instruction paravirtualization. It must be + initialized very early before platform detection. + +- virtualization support needs more functionality like binary patch. + Probably the calling overhead might not be very large compared to the + emulation overhead of virtualization. However in the native case, the + overhead should be eliminated completely. + A single kernel binary should run on each environment including native, + and the overhead of paravirt_ops on native environment should be as + small as possible. + +- for full virtualization technology, e.g. KVM/IA64 or + Xen/IA64 HVM domain, the result would be + (the emulated platform machine vector. probably dig) + (pv_ops). + This means that the virtualization support layer should be under + the machine vector layer. + +Possibly it might be better to move some function pointers from +paravirt_ops to machine vector. In fact, Xen domU case utilizes both +pv_ops and machine vector. + + +IA64 paravirt_ops +----------------- +In this section, the concrete paravirt_ops will be discussed. +Because of the architecture difference between ia64 and x86, the +resulting set of functions is very different from x86 pv_ops. + +- C function pointer tables +They are not very performance critical so that simple C indirect +function call is acceptable. The following structures are defined at +this moment. For details see linux/include/asm-ia64/paravirt.h + - struct pv_info + This structure describes the execution environment. + - struct pv_init_ops + This structure describes the various initialization hooks. + - struct pv_iosapic_ops + This structure describes hooks to iosapic operations. + - struct pv_irq_ops + This structure describes hooks to irq related operations + - struct pv_time_op + This structure describes hooks to steal time accounting. + +- a set of indirect calls which need optimization +Currently this class of functions correspond to a subset of IA64 +intrinsics. At this moment the optimization with binary patch isn't +implemented yet. +struct pv_cpu_op is defined. For details see +linux/include/asm-ia64/paravirt_privop.h +Mostly they correspond to ia64 intrinsics 1-to-1. +Caveat: Now they are defined as C indirect function pointers, but in +order to support binary patch optimization, they will be changed +using GCC extended inline assembly code. + +- a set of macros for hand written assembly code (.S files) +For maintenance purpose, the taken approach for .S files is single +source code and compile multiple times with different macros definitions. +Each pv_ops instance must define those macros to compile. +The important thing here is that sensitive, but non-privileged +instructions must be paravirtualized and that some privileged +instructions also need paravirtualization for reasonable performance. +Developers who modify .S files must be aware of that. At this moment +an easy checker is implemented to detect paravirtualization breakage. +But it doesn't cover all the cases. + +Sometimes this set of macros is called pv_cpu_asm_op. But there is no +corresponding structure in the source code. +Those macros mostly 1:1 correspond to a subset of privileged +instructions. See linux/include/asm-ia64/native/inst.h. +And some functions written in assembly also need to be overrided so +that each pv_ops instance have to define some macros. Again see +linux/include/asm-ia64/native/inst.h. + + +Those structures must be initialized very early before start_kernel. +Probably initialized in head.S using multi entry point or some other trick. +For native case implementation see linux/arch/ia64/kernel/paravirt.c. diff --git a/Documentation/input/cs461x.txt b/Documentation/input/cs461x.txt index afe0d6543e093fa5dfa4758bbb0cbec90ff0bce3..202e9dbacec39a96990c5722c4b1ab832b939b61 100644 --- a/Documentation/input/cs461x.txt +++ b/Documentation/input/cs461x.txt @@ -31,7 +31,7 @@ The driver works with ALSA drivers simultaneously. For example, the xracer uses joystick as input device and PCM device as sound output in one time. There are no sound or input collisions detected. The source code have comments about them; but I've found the joystick can be initialized -separately of ALSA modules. So, you canm use only one joystick driver +separately of ALSA modules. So, you can use only one joystick driver without ALSA drivers. The ALSA drivers are not needed to compile or run this driver. diff --git a/Documentation/input/gameport-programming.txt b/Documentation/input/gameport-programming.txt index 14e0a8b70225f5cce93c2cf8bdb85a0aa984b8ee..03a74fc3b49679c0326a59faa43b2e423751e0c5 100644 --- a/Documentation/input/gameport-programming.txt +++ b/Documentation/input/gameport-programming.txt @@ -1,5 +1,3 @@ -$Id: gameport-programming.txt,v 1.3 2001/04/24 13:51:37 vojtech Exp $ - Programming gameport drivers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/input/input.txt b/Documentation/input/input.txt index ff8cea0225f90bdce6c28adc5b3af5816f9eea44..686ee9932dffd86fa78775859a97815ecc514be2 100644 --- a/Documentation/input/input.txt +++ b/Documentation/input/input.txt @@ -1,7 +1,6 @@ Linux Input drivers v1.0 (c) 1999-2001 Vojtech Pavlik Sponsored by SuSE - $Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $ ---------------------------------------------------------------------------- 0. Disclaimer diff --git a/Documentation/input/joystick-api.txt b/Documentation/input/joystick-api.txt index acbd32b88454dc5c48c7a10bb112757910d3c7e5..c507330740cd4fb14e37e617d01d95262057d129 100644 --- a/Documentation/input/joystick-api.txt +++ b/Documentation/input/joystick-api.txt @@ -5,8 +5,6 @@ 7 Aug 1998 - $Id: joystick-api.txt,v 1.2 2001/05/08 21:21:23 vojtech Exp $ - 1. Initialization ~~~~~~~~~~~~~~~~~ diff --git a/Documentation/input/joystick-parport.txt b/Documentation/input/joystick-parport.txt index ede5f33daad3dbc0f7346654a348fd4eb28d7387..1c856f32ff2c3bc880746533c149ec6cc146806f 100644 --- a/Documentation/input/joystick-parport.txt +++ b/Documentation/input/joystick-parport.txt @@ -2,7 +2,6 @@ (c) 1998-2000 Vojtech Pavlik (c) 1998 Andree Borrmann Sponsored by SuSE - $Id: joystick-parport.txt,v 1.6 2001/09/25 09:31:32 vojtech Exp $ ---------------------------------------------------------------------------- 0. Disclaimer diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt index 389de9bd987894017bb25ddc95b911c98526bdcb..154d767b2acbc4be459044d78e7f63b975faa207 100644 --- a/Documentation/input/joystick.txt +++ b/Documentation/input/joystick.txt @@ -1,7 +1,6 @@ Linux Joystick driver v2.0.0 (c) 1996-2000 Vojtech Pavlik Sponsored by SuSE - $Id: joystick.txt,v 1.12 2002/03/03 12:13:07 jdeneux Exp $ ---------------------------------------------------------------------------- 0. Disclaimer diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt index c19efdeace2cd220a73554b66f5c0fbef1d81d0b..91a6ecbae0bb9601b7d50e084af1eeb96869f730 100644 --- a/Documentation/ioctl/hdio.txt +++ b/Documentation/ioctl/hdio.txt @@ -508,12 +508,13 @@ HDIO_DRIVE_RESET execute a device reset error returns: EACCES Access denied: requires CAP_SYS_ADMIN + ENXIO No such device: phy dead or ctl_addr == 0 + EIO I/O error: reset timed out or hardware error notes: - Abort any current command, prevent anything else from being - queued, execute a reset on the device, and issue BLKRRPART - ioctl on the block device. + Execute a reset on the device as soon as the current IO + operation has completed. Executes an ATAPI soft reset if applicable, otherwise executes an ATA soft reset on the controller. diff --git a/Documentation/ioctl/ioctl-decoding.txt b/Documentation/ioctl/ioctl-decoding.txt index bfdf7f3ee4f05e0f52e08c8aee1ad55c248b8af4..e35efb0cec2e64a99538d24c2c7c7b38846abc26 100644 --- a/Documentation/ioctl/ioctl-decoding.txt +++ b/Documentation/ioctl/ioctl-decoding.txt @@ -1,6 +1,6 @@ To decode a hex IOCTL code: -Most architecures use this generic format, but check +Most architectures use this generic format, but check include/ARCH/ioctl.h for specifics, e.g. powerpc uses 3 bits to encode read/write and 13 bits for size. @@ -18,7 +18,7 @@ uses 3 bits to encode read/write and 13 bits for size. 7-0 function # - So for example 0x82187201 is a read with arg length of 0x218, +So for example 0x82187201 is a read with arg length of 0x218, character 'r' function 1. Grepping the source reveals this is: #define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct dirent [2]) diff --git a/Documentation/iostats.txt b/Documentation/iostats.txt index 5925c3cd030d55884b3f962ed40ced5a20709cf4..59a69ec67c408a54ee724df4d4a64a69a8317f68 100644 --- a/Documentation/iostats.txt +++ b/Documentation/iostats.txt @@ -143,7 +143,7 @@ disk and partition statistics are consistent again. Since we still don't keep record of the partition-relative address, an operation is attributed to the partition which contains the first sector of the request after the eventual merges. As requests can be merged across partition, this could lead -to some (probably insignificant) innacuracy. +to some (probably insignificant) inaccuracy. Additional notes ---------------- diff --git a/Documentation/isdn/README.mISDN b/Documentation/isdn/README.mISDN new file mode 100644 index 0000000000000000000000000000000000000000..cd8bf920e77bcac0ec8d4068db18441bd28083d5 --- /dev/null +++ b/Documentation/isdn/README.mISDN @@ -0,0 +1,6 @@ +mISDN is a new modular ISDN driver, in the long term it should replace +the old I4L driver architecture for passiv ISDN cards. +It was designed to allow a broad range of applications and interfaces +but only have the basic function in kernel, the interface to the user +space is based on sockets with a own address family AF_ISDN. + diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 9691c7f5166c1ea60acfbc383799dc5ec4dcbd21..0705040531a534c2c153c1289fe0febf92926a61 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -65,26 +65,26 @@ Install kexec-tools 2) Download the kexec-tools user-space package from the following URL: -http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-tools-testing.tar.gz +http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-tools.tar.gz -This is a symlink to the latest version, which at the time of writing is -20061214, the only release of kexec-tools-testing so far. As other versions -are released, the older ones will remain available at -http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/ +This is a symlink to the latest version. -Note: Latest kexec-tools-testing git tree is available at +The latest kexec-tools git tree is available at: -git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools-testing.git +git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools.git or -http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools-testing.git;a=summary +http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools.git + +More information about kexec-tools can be found at +http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/README.html 3) Unpack the tarball with the tar command, as follows: - tar xvpzf kexec-tools-testing.tar.gz + tar xvpzf kexec-tools.tar.gz 4) Change to the kexec-tools directory, as follows: - cd kexec-tools-testing-VERSION + cd kexec-tools-VERSION 5) Configure the package, as follows: diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 312fe77764a48cba9fb04e000fc2dffeba7fa978..e7bea3e853044e54f3ee863ad3d295b0df84a522 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -87,7 +87,8 @@ parameter is applicable: SH SuperH architecture is enabled. SMP The kernel is an SMP kernel. SPARC Sparc architecture is enabled. - SWSUSP Software suspend is enabled. + SWSUSP Software suspend (hibernation) is enabled. + SUSPEND System suspend states are enabled. TS Appropriate touchscreen support is enabled. USB USB support is enabled. USBHID USB Human Interface Device support is enabled. @@ -147,10 +148,16 @@ and is between 256 and 4096 characters. It is defined in the file default: 0 acpi_sleep= [HW,ACPI] Sleep options - Format: { s3_bios, s3_mode, s3_beep } + Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig, old_ordering } See Documentation/power/video.txt for s3_bios and s3_mode. s3_beep is for debugging; it makes the PC's speaker beep as soon as the kernel's real-mode entry point is called. + s4_nohwsig prevents ACPI hardware signature from being + used during resume from hibernation. + old_ordering causes the ACPI 1.0 ordering of the _PTS + control method, wrt putting devices into low power + states, to be enforced (the ACPI 2.0 ordering of _PTS is + used by default). acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode Format: { level | edge | high | low } @@ -770,8 +777,22 @@ and is between 256 and 4096 characters. It is defined in the file hisax= [HW,ISDN] See Documentation/isdn/README.HiSax. - hugepages= [HW,X86-32,IA-64] Maximal number of HugeTLB pages. - hugepagesz= [HW,IA-64,PPC] The size of the HugeTLB pages. + hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot. + hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages. + On x86-64 and powerpc, this option can be specified + multiple times interleaved with hugepages= to reserve + huge pages of different sizes. Valid pages sizes on + x86-64 are 2M (when the CPU supports "pse") and 1G + (when the CPU supports the "pdpe1gb" cpuinfo flag) + Note that 1GB pages can only be allocated at boot time + using hugepages= and not freed afterwards. + default_hugepagesz= + [same as hugepagesz=] The size of the default + HugeTLB page size. This is the size represented by + the legacy /proc/ hugepages APIs, used for SHM, and + default size when mounting hugetlbfs filesystems. + Defaults to the default architecture's huge page size + if not specified. i8042.direct [HW] Put keyboard port into non-translated mode i8042.dumbkbd [HW] Pretend that controller can only read data from @@ -818,7 +839,7 @@ and is between 256 and 4096 characters. It is defined in the file See Documentation/ide/ide.txt. idle= [X86] - Format: idle=poll or idle=mwait + Format: idle=poll or idle=mwait, idle=halt, idle=nomwait Poll forces a polling idle loop that can slightly improves the performance of waking up a idle CPU, but will use a lot of power and make the system run hot. Not recommended. @@ -826,6 +847,9 @@ and is between 256 and 4096 characters. It is defined in the file to not use it because it doesn't save as much power as a normal idle loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same as idle=poll. + idle=halt. Halt is forced to be used for CPU idle. + In such case C2/C3 won't be used again. + idle=nomwait. Disable mwait for CPU C-states ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem Claim all unknown PCI IDE storage controllers. @@ -1199,7 +1223,7 @@ and is between 256 and 4096 characters. It is defined in the file or memmap=0x10000$0x18690000 - memtest= [KNL,X86_64] Enable memtest + memtest= [KNL,X86] Enable memtest Format: range: 0,4 : pattern number default : 0 @@ -1218,6 +1242,14 @@ and is between 256 and 4096 characters. It is defined in the file mga= [HW,DRM] + mminit_loglevel= + [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this + parameter allows control of the logging verbosity for + the additional memory initialisation checks. A value + of 0 disables mminit logging and a level of 4 will + log everything. Information is printed at KERN_DEBUG + so loglevel=8 may also need to be specified. + mousedev.tap_time= [MOUSE] Maximum time between finger touching and leaving touchpad surface for touch to be considered @@ -1272,6 +1304,13 @@ and is between 256 and 4096 characters. It is defined in the file This usage is only documented in each driver source file if at all. + nf_conntrack.acct= + [NETFILTER] Enable connection tracking flow accounting + 0 to disable accounting + 1 to enable accounting + Default value depends on CONFIG_NF_CT_ACCT that is + going to be removed in 2.6.29. + nfsaddrs= [NFS] See Documentation/filesystems/nfsroot.txt. @@ -1534,6 +1573,9 @@ and is between 256 and 4096 characters. It is defined in the file Use with caution as certain devices share address decoders between ROMs and other resources. + norom [X86-32,X86_64] Do not assign address space to + expansion ROMs that do not already have + BIOS assigned address ranges. irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be assigned automatically to PCI devices. You can make the kernel exclude IRQs of your ISA cards @@ -2017,6 +2059,9 @@ and is between 256 and 4096 characters. It is defined in the file snd-ymfpci= [HW,ALSA] + softlockup_panic= + [KNL] Should the soft-lockup detector generate panics. + sonypi.*= [HW] Sony Programmable I/O Control Device driver See Documentation/sonypi.txt @@ -2081,6 +2126,12 @@ and is between 256 and 4096 characters. It is defined in the file tdfx= [HW,DRM] + test_suspend= [SUSPEND] + Specify "mem" (for Suspend-to-RAM) or "standby" (for + standby suspend) as the system sleep state to briefly + enter during system startup. The system is woken from + this state using a wakeup-capable RTC alarm. + thash_entries= [KNL,NET] Set number of hash buckets for TCP connection @@ -2108,13 +2159,6 @@ and is between 256 and 4096 characters. It is defined in the file : poll all this frequency 0: no polling (default) - tipar.timeout= [HW,PPT] - Set communications timeout in tenths of a second - (default 15). - - tipar.delay= [HW,PPT] - Set inter-bit delay in microseconds (default 10). - tmscsim= [HW,SCSI] See comment before function dc390_setup() in drivers/scsi/tmscsim.c. @@ -2148,6 +2192,10 @@ and is between 256 and 4096 characters. It is defined in the file Note that genuine overcurrent events won't be reported either. + unknown_nmi_panic + [X86-32,X86-64] + Set unknown_nmi_panic=1 early on boot. + usbcore.autosuspend= [USB] The autosuspend time delay (in seconds) used for newly-detected USB devices (default 2). This diff --git a/Documentation/keys.txt b/Documentation/keys.txt index d5c7a57d17007fa0f43558b270b0e4c302d5f8aa..b56aacc1fff864022dbdf67aac997d54f0d14f32 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -864,7 +864,7 @@ payload contents" for more information. request_key_with_auxdata() respectively. These two functions return with the key potentially still under - construction. To wait for contruction completion, the following should be + construction. To wait for construction completion, the following should be called: int wait_for_key_construction(struct key *key, bool intr); diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 6877e71871132b92d60a603fe444f93af53f0aa1..a79633d702bff23230bb0d76cb68565e078b4fb2 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -172,6 +172,7 @@ architectures: - ia64 (Does not support probes on instruction slot1.) - sparc64 (Return probes not yet implemented.) - arm +- ppc 3. Configuring Kprobes diff --git a/Documentation/laptops/acer-wmi.txt b/Documentation/laptops/acer-wmi.txt index 79b7dbd2214190ae9bf9bce988faa176491c5bce..69b5dd4e5a59bd74550397266122c465a8a12b5f 100644 --- a/Documentation/laptops/acer-wmi.txt +++ b/Documentation/laptops/acer-wmi.txt @@ -174,8 +174,6 @@ The LED is exposed through the LED subsystem, and can be found in: The mail LED is autodetected, so if you don't have one, the LED device won't be registered. -If you have a mail LED that is not green, please report this to me. - Backlight ********* diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 64b3f146e4b09aa4d2cd312e97689f9eba503347..02dc748b76c4ac861549077392ef1deb3ca9c652 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1,7 +1,7 @@ ThinkPad ACPI Extras Driver - Version 0.20 - April 09th, 2008 + Version 0.21 + May 29th, 2008 Borislav Deianov Henrique de Moraes Holschuh @@ -621,7 +621,8 @@ Bluetooth --------- procfs: /proc/acpi/ibm/bluetooth -sysfs device attribute: bluetooth_enable +sysfs device attribute: bluetooth_enable (deprecated) +sysfs rfkill class: switch "tpacpi_bluetooth_sw" This feature shows the presence and current state of a ThinkPad Bluetooth device in the internal ThinkPad CDC slot. @@ -643,8 +644,12 @@ Sysfs notes: 0: disables Bluetooth / Bluetooth is disabled 1: enables Bluetooth / Bluetooth is enabled. - Note: this interface will be probably be superseded by the - generic rfkill class, so it is NOT to be considered stable yet. + Note: this interface has been superseded by the generic rfkill + class. It has been deprecated, and it will be removed in year + 2010. + + rfkill controller switch "tpacpi_bluetooth_sw": refer to + Documentation/rfkill.txt for details. Video output control -- /proc/acpi/ibm/video -------------------------------------------- @@ -1374,7 +1379,8 @@ EXPERIMENTAL: WAN ----------------- procfs: /proc/acpi/ibm/wan -sysfs device attribute: wwan_enable +sysfs device attribute: wwan_enable (deprecated) +sysfs rfkill class: switch "tpacpi_wwan_sw" This feature is marked EXPERIMENTAL because the implementation directly accesses hardware registers and may not work as expected. USE @@ -1404,8 +1410,12 @@ Sysfs notes: 0: disables WWAN card / WWAN card is disabled 1: enables WWAN card / WWAN card is enabled. - Note: this interface will be probably be superseded by the - generic rfkill class, so it is NOT to be considered stable yet. + Note: this interface has been superseded by the generic rfkill + class. It has been deprecated, and it will be removed in year + 2010. + + rfkill controller switch "tpacpi_wwan_sw": refer to + Documentation/rfkill.txt for details. Multiple Commands, Module Parameters ------------------------------------ diff --git a/Documentation/leds-class.txt b/Documentation/leds-class.txt index 18860ad9935a7876746c00660f36b9fd90529499..6399557cdab3d6542a0feea0c0cd2c6b2af5ffad 100644 --- a/Documentation/leds-class.txt +++ b/Documentation/leds-class.txt @@ -59,7 +59,7 @@ Hardware accelerated blink of LEDs Some LEDs can be programmed to blink without any CPU interaction. To support this feature, a LED driver can optionally implement the -blink_set() function (see ). If implemeted, triggers can +blink_set() function (see ). If implemented, triggers can attempt to use it before falling back to software timers. The blink_set() function should return 0 if the blink setting is supported, or -EINVAL otherwise, which means that LED blinking will be handled by software. diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 82fafe0429fed8dc441411059be52b33154dd8a8..b88b0ea54e90c597896aac6eb478e1b56bab8bd0 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -36,11 +36,13 @@ #include #include #include +#include #include "linux/lguest_launcher.h" #include "linux/virtio_config.h" #include "linux/virtio_net.h" #include "linux/virtio_blk.h" #include "linux/virtio_console.h" +#include "linux/virtio_rng.h" #include "linux/virtio_ring.h" #include "asm-x86/bootparam.h" /*L:110 We can ignore the 39 include files we need for this program, but I do @@ -64,8 +66,8 @@ typedef uint8_t u8; #endif /* We can have up to 256 pages for devices. */ #define DEVICE_PAGES 256 -/* This will occupy 2 pages: it must be a power of 2. */ -#define VIRTQUEUE_NUM 128 +/* This will occupy 3 pages: it must be a power of 2. */ +#define VIRTQUEUE_NUM 256 /*L:120 verbose is both a global flag and a macro. The C preprocessor allows * this, and although I wouldn't recommend it, it works quite nicely here. */ @@ -74,12 +76,19 @@ static bool verbose; do { if (verbose) printf(args); } while(0) /*:*/ -/* The pipe to send commands to the waker process */ -static int waker_fd; +/* File descriptors for the Waker. */ +struct { + int pipe[2]; + int lguest_fd; +} waker_fds; + /* The pointer to the start of guest memory. */ static void *guest_base; /* The maximum guest physical address allowed, and maximum possible. */ static unsigned long guest_limit, guest_max; +/* The pipe for signal hander to write to. */ +static int timeoutpipe[2]; +static unsigned int timeout_usec = 500; /* a per-cpu variable indicating whose vcpu is currently running */ static unsigned int __thread cpu_id; @@ -155,11 +164,14 @@ struct virtqueue /* Last available index we saw. */ u16 last_avail_idx; - /* The routine to call when the Guest pings us. */ - void (*handle_output)(int fd, struct virtqueue *me); + /* The routine to call when the Guest pings us, or timeout. */ + void (*handle_output)(int fd, struct virtqueue *me, bool timeout); /* Outstanding buffers */ unsigned int inflight; + + /* Is this blocked awaiting a timer? */ + bool blocked; }; /* Remember the arguments to the program so we can "reboot" */ @@ -190,6 +202,9 @@ static void *_convert(struct iovec *iov, size_t size, size_t align, return iov->iov_base; } +/* Wrapper for the last available index. Makes it easier to change. */ +#define lg_last_avail(vq) ((vq)->last_avail_idx) + /* The virtio configuration space is defined to be little-endian. x86 is * little-endian too, but it's nice to be explicit so we have these helpers. */ #define cpu_to_le16(v16) (v16) @@ -199,6 +214,33 @@ static void *_convert(struct iovec *iov, size_t size, size_t align, #define le32_to_cpu(v32) (v32) #define le64_to_cpu(v64) (v64) +/* Is this iovec empty? */ +static bool iov_empty(const struct iovec iov[], unsigned int num_iov) +{ + unsigned int i; + + for (i = 0; i < num_iov; i++) + if (iov[i].iov_len) + return false; + return true; +} + +/* Take len bytes from the front of this iovec. */ +static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len) +{ + unsigned int i; + + for (i = 0; i < num_iov; i++) { + unsigned int used; + + used = iov[i].iov_len < len ? iov[i].iov_len : len; + iov[i].iov_base += used; + iov[i].iov_len -= used; + len -= used; + } + assert(len == 0); +} + /* The device virtqueue descriptors are followed by feature bitmasks. */ static u8 *get_feature_bits(struct device *dev) { @@ -254,6 +296,7 @@ static void *map_zeroed_pages(unsigned int num) PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0); if (addr == MAP_FAILED) err(1, "Mmaping %u pages of /dev/zero", num); + close(fd); return addr; } @@ -540,69 +583,64 @@ static void add_device_fd(int fd) * watch, but handing a file descriptor mask through to the kernel is fairly * icky. * - * Instead, we fork off a process which watches the file descriptors and writes + * Instead, we clone off a thread which watches the file descriptors and writes * the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host * stop running the Guest. This causes the Launcher to return from the * /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset * the LHREQ_BREAK and wake us up again. * * This, of course, is merely a different *kind* of icky. + * + * Given my well-known antipathy to threads, I'd prefer to use processes. But + * it's easier to share Guest memory with threads, and trivial to share the + * devices.infds as the Launcher changes it. */ -static void wake_parent(int pipefd, int lguest_fd) +static int waker(void *unused) { - /* Add the pipe from the Launcher to the fdset in the device_list, so - * we watch it, too. */ - add_device_fd(pipefd); + /* Close the write end of the pipe: only the Launcher has it open. */ + close(waker_fds.pipe[1]); for (;;) { fd_set rfds = devices.infds; unsigned long args[] = { LHREQ_BREAK, 1 }; + unsigned int maxfd = devices.max_infd; + + /* We also listen to the pipe from the Launcher. */ + FD_SET(waker_fds.pipe[0], &rfds); + if (waker_fds.pipe[0] > maxfd) + maxfd = waker_fds.pipe[0]; /* Wait until input is ready from one of the devices. */ - select(devices.max_infd+1, &rfds, NULL, NULL, NULL); - /* Is it a message from the Launcher? */ - if (FD_ISSET(pipefd, &rfds)) { - int fd; - /* If read() returns 0, it means the Launcher has - * exited. We silently follow. */ - if (read(pipefd, &fd, sizeof(fd)) == 0) - exit(0); - /* Otherwise it's telling us to change what file - * descriptors we're to listen to. Positive means - * listen to a new one, negative means stop - * listening. */ - if (fd >= 0) - FD_SET(fd, &devices.infds); - else - FD_CLR(-fd - 1, &devices.infds); - } else /* Send LHREQ_BREAK command. */ - pwrite(lguest_fd, args, sizeof(args), cpu_id); + select(maxfd+1, &rfds, NULL, NULL, NULL); + + /* Message from Launcher? */ + if (FD_ISSET(waker_fds.pipe[0], &rfds)) { + char c; + /* If this fails, then assume Launcher has exited. + * Don't do anything on exit: we're just a thread! */ + if (read(waker_fds.pipe[0], &c, 1) != 1) + _exit(0); + continue; + } + + /* Send LHREQ_BREAK command to snap the Launcher out of it. */ + pwrite(waker_fds.lguest_fd, args, sizeof(args), cpu_id); } + return 0; } /* This routine just sets up a pipe to the Waker process. */ -static int setup_waker(int lguest_fd) -{ - int pipefd[2], child; - - /* We create a pipe to talk to the Waker, and also so it knows when the - * Launcher dies (and closes pipe). */ - pipe(pipefd); - child = fork(); - if (child == -1) - err(1, "forking"); - - if (child == 0) { - /* We are the Waker: close the "writing" end of our copy of the - * pipe and start waiting for input. */ - close(pipefd[1]); - wake_parent(pipefd[0], lguest_fd); - } - /* Close the reading end of our copy of the pipe. */ - close(pipefd[0]); +static void setup_waker(int lguest_fd) +{ + /* This pipe is closed when Launcher dies, telling Waker. */ + if (pipe(waker_fds.pipe) != 0) + err(1, "Creating pipe for Waker"); - /* Here is the fd used to talk to the waker. */ - return pipefd[1]; + /* Waker also needs to know the lguest fd */ + waker_fds.lguest_fd = lguest_fd; + + if (clone(waker, malloc(4096) + 4096, CLONE_VM | SIGCHLD, NULL) == -1) + err(1, "Creating Waker"); } /* @@ -661,19 +699,22 @@ static unsigned get_vq_desc(struct virtqueue *vq, unsigned int *out_num, unsigned int *in_num) { unsigned int i, head; + u16 last_avail; /* Check it isn't doing very strange things with descriptor numbers. */ - if ((u16)(vq->vring.avail->idx - vq->last_avail_idx) > vq->vring.num) + last_avail = lg_last_avail(vq); + if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num) errx(1, "Guest moved used index from %u to %u", - vq->last_avail_idx, vq->vring.avail->idx); + last_avail, vq->vring.avail->idx); /* If there's nothing new since last we looked, return invalid. */ - if (vq->vring.avail->idx == vq->last_avail_idx) + if (vq->vring.avail->idx == last_avail) return vq->vring.num; /* Grab the next descriptor number they're advertising, and increment * the index we've seen. */ - head = vq->vring.avail->ring[vq->last_avail_idx++ % vq->vring.num]; + head = vq->vring.avail->ring[last_avail % vq->vring.num]; + lg_last_avail(vq)++; /* If their number is silly, that's a fatal mistake. */ if (head >= vq->vring.num) @@ -821,8 +862,8 @@ static bool handle_console_input(int fd, struct device *dev) unsigned long args[] = { LHREQ_BREAK, 0 }; /* Close the fd so Waker will know it has to * exit. */ - close(waker_fd); - /* Just in case waker is blocked in BREAK, send + close(waker_fds.pipe[1]); + /* Just in case Waker is blocked in BREAK, send * unbreak now. */ write(fd, args, sizeof(args)); exit(2); @@ -839,7 +880,7 @@ static bool handle_console_input(int fd, struct device *dev) /* Handling output for console is simple: we just get all the output buffers * and write them to stdout. */ -static void handle_console_output(int fd, struct virtqueue *vq) +static void handle_console_output(int fd, struct virtqueue *vq, bool timeout) { unsigned int head, out, in; int len; @@ -854,6 +895,21 @@ static void handle_console_output(int fd, struct virtqueue *vq) } } +static void block_vq(struct virtqueue *vq) +{ + struct itimerval itm; + + vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; + vq->blocked = true; + + itm.it_interval.tv_sec = 0; + itm.it_interval.tv_usec = 0; + itm.it_value.tv_sec = 0; + itm.it_value.tv_usec = timeout_usec; + + setitimer(ITIMER_REAL, &itm, NULL); +} + /* * The Network * @@ -861,22 +917,34 @@ static void handle_console_output(int fd, struct virtqueue *vq) * and write them (ignoring the first element) to this device's file descriptor * (/dev/net/tun). */ -static void handle_net_output(int fd, struct virtqueue *vq) +static void handle_net_output(int fd, struct virtqueue *vq, bool timeout) { - unsigned int head, out, in; + unsigned int head, out, in, num = 0; int len; struct iovec iov[vq->vring.num]; + static int last_timeout_num; /* Keep getting output buffers from the Guest until we run out. */ while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) { if (in) errx(1, "Input buffers in output queue?"); - /* Check header, but otherwise ignore it (we told the Guest we - * supported no features, so it shouldn't have anything - * interesting). */ - (void)convert(&iov[0], struct virtio_net_hdr); - len = writev(vq->dev->fd, iov+1, out-1); + len = writev(vq->dev->fd, iov, out); + if (len < 0) + err(1, "Writing network packet to tun"); add_used_and_trigger(fd, vq, head, len); + num++; + } + + /* Block further kicks and set up a timer if we saw anything. */ + if (!timeout && num) + block_vq(vq); + + if (timeout) { + if (num < last_timeout_num) + timeout_usec += 10; + else if (timeout_usec > 1) + timeout_usec--; + last_timeout_num = num; } } @@ -887,7 +955,6 @@ static bool handle_tun_input(int fd, struct device *dev) unsigned int head, in_num, out_num; int len; struct iovec iov[dev->vq->vring.num]; - struct virtio_net_hdr *hdr; /* First we need a network buffer from the Guests's recv virtqueue. */ head = get_vq_desc(dev->vq, iov, &out_num, &in_num); @@ -896,25 +963,23 @@ static bool handle_tun_input(int fd, struct device *dev) * early, the Guest won't be ready yet. Wait until the device * status says it's ready. */ /* FIXME: Actually want DRIVER_ACTIVE here. */ - if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) - warn("network: no dma buffer!"); + + /* Now tell it we want to know if new things appear. */ + dev->vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY; + wmb(); + /* We'll turn this back on if input buffers are registered. */ return false; } else if (out_num) errx(1, "Output buffers in network recv queue?"); - /* First element is the header: we set it to 0 (no features). */ - hdr = convert(&iov[0], struct virtio_net_hdr); - hdr->flags = 0; - hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE; - /* Read the packet from the device directly into the Guest's buffer. */ - len = readv(dev->fd, iov+1, in_num-1); + len = readv(dev->fd, iov, in_num); if (len <= 0) err(1, "reading network"); /* Tell the Guest about the new packet. */ - add_used_and_trigger(fd, dev->vq, head, sizeof(*hdr) + len); + add_used_and_trigger(fd, dev->vq, head, len); verbose("tun input packet len %i [%02x %02x] (%s)\n", len, ((u8 *)iov[1].iov_base)[0], ((u8 *)iov[1].iov_base)[1], @@ -927,11 +992,18 @@ static bool handle_tun_input(int fd, struct device *dev) /*L:215 This is the callback attached to the network and console input * virtqueues: it ensures we try again, in case we stopped console or net * delivery because Guest didn't have any buffers. */ -static void enable_fd(int fd, struct virtqueue *vq) +static void enable_fd(int fd, struct virtqueue *vq, bool timeout) { add_device_fd(vq->dev->fd); - /* Tell waker to listen to it again */ - write(waker_fd, &vq->dev->fd, sizeof(vq->dev->fd)); + /* Snap the Waker out of its select loop. */ + write(waker_fds.pipe[1], "", 1); +} + +static void net_enable_fd(int fd, struct virtqueue *vq, bool timeout) +{ + /* We don't need to know again when Guest refills receive buffer. */ + vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; + enable_fd(fd, vq, timeout); } /* When the Guest tells us they updated the status field, we handle it. */ @@ -951,7 +1023,7 @@ static void update_device_status(struct device *dev) for (vq = dev->vq; vq; vq = vq->next) { memset(vq->vring.desc, 0, vring_size(vq->config.num, getpagesize())); - vq->last_avail_idx = 0; + lg_last_avail(vq) = 0; } } else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) { warnx("Device %s configuration FAILED", dev->name); @@ -960,10 +1032,10 @@ static void update_device_status(struct device *dev) verbose("Device %s OK: offered", dev->name); for (i = 0; i < dev->desc->feature_len; i++) - verbose(" %08x", get_feature_bits(dev)[i]); + verbose(" %02x", get_feature_bits(dev)[i]); verbose(", accepted"); for (i = 0; i < dev->desc->feature_len; i++) - verbose(" %08x", get_feature_bits(dev) + verbose(" %02x", get_feature_bits(dev) [dev->desc->feature_len+i]); if (dev->ready) @@ -1000,7 +1072,7 @@ static void handle_output(int fd, unsigned long addr) if (strcmp(vq->dev->name, "console") != 0) verbose("Output to %s\n", vq->dev->name); if (vq->handle_output) - vq->handle_output(fd, vq); + vq->handle_output(fd, vq, false); return; } } @@ -1014,6 +1086,29 @@ static void handle_output(int fd, unsigned long addr) strnlen(from_guest_phys(addr), guest_limit - addr)); } +static void handle_timeout(int fd) +{ + char buf[32]; + struct device *i; + struct virtqueue *vq; + + /* Clear the pipe */ + read(timeoutpipe[0], buf, sizeof(buf)); + + /* Check each device and virtqueue: flush blocked ones. */ + for (i = devices.dev; i; i = i->next) { + for (vq = i->vq; vq; vq = vq->next) { + if (!vq->blocked) + continue; + + vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY; + vq->blocked = false; + if (vq->handle_output) + vq->handle_output(fd, vq, true); + } + } +} + /* This is called when the Waker wakes us up: check for incoming file * descriptors. */ static void handle_input(int fd) @@ -1024,16 +1119,20 @@ static void handle_input(int fd) for (;;) { struct device *i; fd_set fds = devices.infds; + int num; + num = select(devices.max_infd+1, &fds, NULL, NULL, &poll); + /* Could get interrupted */ + if (num < 0) + continue; /* If nothing is ready, we're done. */ - if (select(devices.max_infd+1, &fds, NULL, NULL, &poll) == 0) + if (num == 0) break; /* Otherwise, call the device(s) which have readable file * descriptors and a method of handling them. */ for (i = devices.dev; i; i = i->next) { if (i->handle_input && FD_ISSET(i->fd, &fds)) { - int dev_fd; if (i->handle_input(fd, i)) continue; @@ -1043,13 +1142,12 @@ static void handle_input(int fd) * buffers to deliver into. Console also uses * it when it discovers that stdin is closed. */ FD_CLR(i->fd, &devices.infds); - /* Tell waker to ignore it too, by sending a - * negative fd number (-1, since 0 is a valid - * FD number). */ - dev_fd = -i->fd - 1; - write(waker_fd, &dev_fd, sizeof(dev_fd)); } } + + /* Is this the timeout fd? */ + if (FD_ISSET(timeoutpipe[0], &fds)) + handle_timeout(fd); } } @@ -1098,7 +1196,7 @@ static struct lguest_device_desc *new_dev_desc(u16 type) /* Each device descriptor is followed by the description of its virtqueues. We * specify how many descriptors the virtqueue is to have. */ static void add_virtqueue(struct device *dev, unsigned int num_descs, - void (*handle_output)(int fd, struct virtqueue *me)) + void (*handle_output)(int, struct virtqueue *, bool)) { unsigned int pages; struct virtqueue **i, *vq = malloc(sizeof(*vq)); @@ -1114,6 +1212,7 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs, vq->last_avail_idx = 0; vq->dev = dev; vq->inflight = 0; + vq->blocked = false; /* Initialize the configuration. */ vq->config.num = num_descs; @@ -1246,6 +1345,24 @@ static void setup_console(void) } /*:*/ +static void timeout_alarm(int sig) +{ + write(timeoutpipe[1], "", 1); +} + +static void setup_timeout(void) +{ + if (pipe(timeoutpipe) != 0) + err(1, "Creating timeout pipe"); + + if (fcntl(timeoutpipe[1], F_SETFL, + fcntl(timeoutpipe[1], F_GETFL) | O_NONBLOCK) != 0) + err(1, "Making timeout pipe nonblocking"); + + add_device_fd(timeoutpipe[0]); + signal(SIGALRM, timeout_alarm); +} + /*M:010 Inter-guest networking is an interesting area. Simplest is to have a * --sharenet= option which opens or creates a named pipe. This can be * used to send packets to another guest in a 1:1 manner. @@ -1264,10 +1381,25 @@ static void setup_console(void) static u32 str2ip(const char *ipaddr) { - unsigned int byte[4]; + unsigned int b[4]; - sscanf(ipaddr, "%u.%u.%u.%u", &byte[0], &byte[1], &byte[2], &byte[3]); - return (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]; + if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4) + errx(1, "Failed to parse IP address '%s'", ipaddr); + return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3]; +} + +static void str2mac(const char *macaddr, unsigned char mac[6]) +{ + unsigned int m[6]; + if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x", + &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6) + errx(1, "Failed to parse mac address '%s'", macaddr); + mac[0] = m[0]; + mac[1] = m[1]; + mac[2] = m[2]; + mac[3] = m[3]; + mac[4] = m[4]; + mac[5] = m[5]; } /* This code is "adapted" from libbridge: it attaches the Host end of the @@ -1288,6 +1420,7 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name) errx(1, "interface %s does not exist!", if_name); strncpy(ifr.ifr_name, br_name, IFNAMSIZ); + ifr.ifr_name[IFNAMSIZ-1] = '\0'; ifr.ifr_ifindex = ifidx; if (ioctl(fd, SIOCBRADDIF, &ifr) < 0) err(1, "can't add %s to bridge %s", if_name, br_name); @@ -1296,64 +1429,90 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name) /* This sets up the Host end of the network device with an IP address, brings * it up so packets will flow, the copies the MAC address into the hwaddr * pointer. */ -static void configure_device(int fd, const char *devname, u32 ipaddr, - unsigned char hwaddr[6]) +static void configure_device(int fd, const char *tapif, u32 ipaddr) { struct ifreq ifr; struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr; - /* Don't read these incantations. Just cut & paste them like I did! */ memset(&ifr, 0, sizeof(ifr)); - strcpy(ifr.ifr_name, devname); + strcpy(ifr.ifr_name, tapif); + + /* Don't read these incantations. Just cut & paste them like I did! */ sin->sin_family = AF_INET; sin->sin_addr.s_addr = htonl(ipaddr); if (ioctl(fd, SIOCSIFADDR, &ifr) != 0) - err(1, "Setting %s interface address", devname); + err(1, "Setting %s interface address", tapif); ifr.ifr_flags = IFF_UP; if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0) - err(1, "Bringing interface %s up", devname); + err(1, "Bringing interface %s up", tapif); +} + +static void get_mac(int fd, const char *tapif, unsigned char hwaddr[6]) +{ + struct ifreq ifr; + + memset(&ifr, 0, sizeof(ifr)); + strcpy(ifr.ifr_name, tapif); /* SIOC stands for Socket I/O Control. G means Get (vs S for Set * above). IF means Interface, and HWADDR is hardware address. * Simple! */ if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) - err(1, "getting hw address for %s", devname); + err(1, "getting hw address for %s", tapif); memcpy(hwaddr, ifr.ifr_hwaddr.sa_data, 6); } -/*L:195 Our network is a Host<->Guest network. This can either use bridging or - * routing, but the principle is the same: it uses the "tun" device to inject - * packets into the Host as if they came in from a normal network card. We - * just shunt packets between the Guest and the tun device. */ -static void setup_tun_net(const char *arg) +static int get_tun_device(char tapif[IFNAMSIZ]) { - struct device *dev; struct ifreq ifr; - int netfd, ipfd; - u32 ip; - const char *br_name = NULL; - struct virtio_net_config conf; + int netfd; + + /* Start with this zeroed. Messy but sure. */ + memset(&ifr, 0, sizeof(ifr)); /* We open the /dev/net/tun device and tell it we want a tap device. A * tap device is like a tun device, only somehow different. To tell * the truth, I completely blundered my way through this code, but it * works now! */ netfd = open_or_die("/dev/net/tun", O_RDWR); - memset(&ifr, 0, sizeof(ifr)); - ifr.ifr_flags = IFF_TAP | IFF_NO_PI; + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; strcpy(ifr.ifr_name, "tap%d"); if (ioctl(netfd, TUNSETIFF, &ifr) != 0) err(1, "configuring /dev/net/tun"); + + if (ioctl(netfd, TUNSETOFFLOAD, + TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0) + err(1, "Could not set features for tun device"); + /* We don't need checksums calculated for packets coming in this * device: trust us! */ ioctl(netfd, TUNSETNOCSUM, 1); + memcpy(tapif, ifr.ifr_name, IFNAMSIZ); + return netfd; +} + +/*L:195 Our network is a Host<->Guest network. This can either use bridging or + * routing, but the principle is the same: it uses the "tun" device to inject + * packets into the Host as if they came in from a normal network card. We + * just shunt packets between the Guest and the tun device. */ +static void setup_tun_net(char *arg) +{ + struct device *dev; + int netfd, ipfd; + u32 ip = INADDR_ANY; + bool bridging = false; + char tapif[IFNAMSIZ], *p; + struct virtio_net_config conf; + + netfd = get_tun_device(tapif); + /* First we create a new network device. */ dev = new_device("net", VIRTIO_ID_NET, netfd, handle_tun_input); /* Network devices need a receive and a send queue, just like * console. */ - add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd); + add_virtqueue(dev, VIRTQUEUE_NUM, net_enable_fd); add_virtqueue(dev, VIRTQUEUE_NUM, handle_net_output); /* We need a socket to perform the magic network ioctls to bring up the @@ -1364,28 +1523,56 @@ static void setup_tun_net(const char *arg) /* If the command line was --tunnet=bridge: do bridging. */ if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) { - ip = INADDR_ANY; - br_name = arg + strlen(BRIDGE_PFX); - add_to_bridge(ipfd, ifr.ifr_name, br_name); - } else /* It is an IP address to set up the device with */ + arg += strlen(BRIDGE_PFX); + bridging = true; + } + + /* A mac address may follow the bridge name or IP address */ + p = strchr(arg, ':'); + if (p) { + str2mac(p+1, conf.mac); + *p = '\0'; + } else { + p = arg + strlen(arg); + /* None supplied; query the randomly assigned mac. */ + get_mac(ipfd, tapif, conf.mac); + } + + /* arg is now either an IP address or a bridge name */ + if (bridging) + add_to_bridge(ipfd, tapif, arg); + else ip = str2ip(arg); - /* Set up the tun device, and get the mac address for the interface. */ - configure_device(ipfd, ifr.ifr_name, ip, conf.mac); + /* Set up the tun device. */ + configure_device(ipfd, tapif, ip); /* Tell Guest what MAC address to use. */ add_feature(dev, VIRTIO_NET_F_MAC); add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY); + /* Expect Guest to handle everything except UFO */ + add_feature(dev, VIRTIO_NET_F_CSUM); + add_feature(dev, VIRTIO_NET_F_GUEST_CSUM); + add_feature(dev, VIRTIO_NET_F_MAC); + add_feature(dev, VIRTIO_NET_F_GUEST_TSO4); + add_feature(dev, VIRTIO_NET_F_GUEST_TSO6); + add_feature(dev, VIRTIO_NET_F_GUEST_ECN); + add_feature(dev, VIRTIO_NET_F_HOST_TSO4); + add_feature(dev, VIRTIO_NET_F_HOST_TSO6); + add_feature(dev, VIRTIO_NET_F_HOST_ECN); set_config(dev, sizeof(conf), &conf); /* We don't need the socket any more; setup is done. */ close(ipfd); - verbose("device %u: tun net %u.%u.%u.%u\n", - devices.device_num++, - (u8)(ip>>24),(u8)(ip>>16),(u8)(ip>>8),(u8)ip); - if (br_name) - verbose("attached to bridge: %s\n", br_name); + devices.device_num++; + + if (bridging) + verbose("device %u: tun %s attached to bridge: %s\n", + devices.device_num, tapif, arg); + else + verbose("device %u: tun %s: %s\n", + devices.device_num, tapif, arg); } /* Our block (disk) device should be really simple: the Guest asks for a block @@ -1550,7 +1737,7 @@ static bool handle_io_finish(int fd, struct device *dev) } /* When the Guest submits some I/O, we just need to wake the I/O thread. */ -static void handle_virtblk_output(int fd, struct virtqueue *vq) +static void handle_virtblk_output(int fd, struct virtqueue *vq, bool timeout) { struct vblk_info *vblk = vq->dev->priv; char c = 0; @@ -1621,6 +1808,64 @@ static void setup_block_file(const char *filename) verbose("device %u: virtblock %llu sectors\n", devices.device_num, le64_to_cpu(conf.capacity)); } + +/* Our random number generator device reads from /dev/random into the Guest's + * input buffers. The usual case is that the Guest doesn't want random numbers + * and so has no buffers although /dev/random is still readable, whereas + * console is the reverse. + * + * The same logic applies, however. */ +static bool handle_rng_input(int fd, struct device *dev) +{ + int len; + unsigned int head, in_num, out_num, totlen = 0; + struct iovec iov[dev->vq->vring.num]; + + /* First we need a buffer from the Guests's virtqueue. */ + head = get_vq_desc(dev->vq, iov, &out_num, &in_num); + + /* If they're not ready for input, stop listening to this file + * descriptor. We'll start again once they add an input buffer. */ + if (head == dev->vq->vring.num) + return false; + + if (out_num) + errx(1, "Output buffers in rng?"); + + /* This is why we convert to iovecs: the readv() call uses them, and so + * it reads straight into the Guest's buffer. We loop to make sure we + * fill it. */ + while (!iov_empty(iov, in_num)) { + len = readv(dev->fd, iov, in_num); + if (len <= 0) + err(1, "Read from /dev/random gave %i", len); + iov_consume(iov, in_num, len); + totlen += len; + } + + /* Tell the Guest about the new input. */ + add_used_and_trigger(fd, dev->vq, head, totlen); + + /* Everything went OK! */ + return true; +} + +/* And this creates a "hardware" random number device for the Guest. */ +static void setup_rng(void) +{ + struct device *dev; + int fd; + + fd = open_or_die("/dev/random", O_RDONLY); + + /* The device responds to return from I/O thread. */ + dev = new_device("rng", VIRTIO_ID_RNG, fd, handle_rng_input); + + /* The device has one virtqueue, where the Guest places inbufs. */ + add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd); + + verbose("device %u: rng\n", devices.device_num++); +} /* That's the end of device setup. */ /*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */ @@ -1628,11 +1873,12 @@ static void __attribute__((noreturn)) restart_guest(void) { unsigned int i; - /* Closing pipes causes the Waker thread and io_threads to die, and - * closing /dev/lguest cleans up the Guest. Since we don't track all - * open fds, we simply close everything beyond stderr. */ + /* Since we don't track all open fds, we simply close everything beyond + * stderr. */ for (i = 3; i < FD_SETSIZE; i++) close(i); + + /* The exec automatically gets rid of the I/O and Waker threads. */ execv(main_args[0], main_args); err(1, "Could not exec %s", main_args[0]); } @@ -1663,7 +1909,7 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd) /* ERESTART means that we need to reboot the guest */ } else if (errno == ERESTART) { restart_guest(); - /* EAGAIN means the Waker wanted us to look at some input. + /* EAGAIN means a signal (timeout). * Anything else means a bug or incompatible change. */ } else if (errno != EAGAIN) err(1, "Running guest failed"); @@ -1691,13 +1937,14 @@ static struct option opts[] = { { "verbose", 0, NULL, 'v' }, { "tunnet", 1, NULL, 't' }, { "block", 1, NULL, 'b' }, + { "rng", 0, NULL, 'r' }, { "initrd", 1, NULL, 'i' }, { NULL }, }; static void usage(void) { errx(1, "Usage: lguest [--verbose] " - "[--tunnet=(|bridge:)\n" + "[--tunnet=(:|bridge::)\n" "|--block=|--initrd=]...\n" " vmlinux [args...]"); } @@ -1765,6 +2012,9 @@ int main(int argc, char *argv[]) case 'b': setup_block_file(optarg); break; + case 'r': + setup_rng(); + break; case 'i': initrd_name = optarg; break; @@ -1783,6 +2033,9 @@ int main(int argc, char *argv[]) /* We always have a console device */ setup_console(); + /* We can timeout waiting for Guest network transmit. */ + setup_timeout(); + /* Now we load the kernel */ start = load_kernel(open_or_die(argv[optind+1], O_RDONLY)); @@ -1826,10 +2079,10 @@ int main(int argc, char *argv[]) * /dev/lguest file descriptor. */ lguest_fd = tell_kernel(pgdir, start); - /* We fork off a child process, which wakes the Launcher whenever one - * of the input file descriptors needs attention. We call this the - * Waker, and we'll cover it in a moment. */ - waker_fd = setup_waker(lguest_fd); + /* We clone off a thread, which wakes the Launcher whenever one of the + * input file descriptors needs attention. We call this the Waker, and + * we'll cover it in a moment. */ + setup_waker(lguest_fd); /* Finally, run the Guest. This doesn't return. */ run_guest(lguest_fd); diff --git a/Documentation/local_ops.txt b/Documentation/local_ops.txt index 4269a1105b378fafcc689435a2531b9d9d4287db..f4f8b1c6c8ba45ba6ec351aa92c40e798024983c 100644 --- a/Documentation/local_ops.txt +++ b/Documentation/local_ops.txt @@ -36,7 +36,7 @@ It can be done by slightly modifying the standard atomic operations : only their UP variant must be kept. It typically means removing LOCK prefix (on i386 and x86_64) and any SMP sychronization barrier. If the architecture does not have a different behavior between SMP and UP, including asm-generic/local.h -in your archtecture's local.h is sufficient. +in your architecture's local.h is sufficient. The local_t type is defined as an opaque signed long by embedding an atomic_long_t inside a structure. This is made so a cast from this type to a diff --git a/Documentation/md.txt b/Documentation/md.txt index a8b430627473aa243995ab6f6e173b9cf1ff819e..1da9d1b1793f3436b3561a48de7fbcd8c893e74e 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -236,6 +236,11 @@ All md devices contain: writing the word for the desired state, however some states cannot be explicitly set, and some transitions are not allowed. + Select/poll works on this file. All changes except between + active_idle and active (which can be frequent and are not + very interesting) are notified. active->active_idle is + reported if the metadata is externally managed. + clear No devices, no size, no level Writing is equivalent to STOP_ARRAY ioctl @@ -292,6 +297,10 @@ Each directory contains: writemostly - device will only be subject to read requests if there are no other options. This applies only to raid1 arrays. + blocked - device has failed, metadata is "external", + and the failure hasn't been acknowledged yet. + Writes that would write to this device if + it were not faulty are blocked. spare - device is working, but not a full member. This includes spares that are in the process of being recovered to @@ -301,6 +310,12 @@ Each directory contains: Writing "remove" removes the device from the array. Writing "writemostly" sets the writemostly flag. Writing "-writemostly" clears the writemostly flag. + Writing "blocked" sets the "blocked" flag. + Writing "-blocked" clear the "blocked" flag and allows writes + to complete. + + This file responds to select/poll. Any change to 'faulty' + or 'blocked' causes an event. errors An approximate count of read errors that have been detected on @@ -332,7 +347,7 @@ Each directory contains: for storage of data. This will normally be the same as the component_size. This can be written while assembling an array. If a value less than the current component_size is - written, component_size will be reduced to this value. + written, it will be rejected. An active md device will also contain and entry for each active device @@ -381,6 +396,19 @@ also have 'check' and 'repair' will start the appropriate process providing the current state is 'idle'. + This file responds to select/poll. Any important change in the value + triggers a poll event. Sometimes the value will briefly be + "recover" if a recovery seems to be needed, but cannot be + achieved. In that case, the transition to "recover" isn't + notified, but the transition away is. + + degraded + This contains a count of the number of devices by which the + arrays is degraded. So an optimal array with show '0'. A + single failed/missing drive will show '1', etc. + This file responds to select/poll, any increase or decrease + in the count of missing devices will trigger an event. + mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are diff --git a/Documentation/moxa-smartio b/Documentation/moxa-smartio index fe24ecc6372e6b3482c3f39972ab9f70077d7b70..5337e80a5b96c6341e523394be6fedc58b87f6ed 100644 --- a/Documentation/moxa-smartio +++ b/Documentation/moxa-smartio @@ -1,14 +1,22 @@ ============================================================================= - - MOXA Smartio Family Device Driver Ver 1.1 Installation Guide - for Linux Kernel 2.2.x and 2.0.3x - Copyright (C) 1999, Moxa Technologies Co, Ltd. + MOXA Smartio/Industio Family Device Driver Installation Guide + for Linux Kernel 2.4.x, 2.6.x + Copyright (C) 2008, Moxa Inc. ============================================================================= +Date: 01/21/2008 + Content 1. Introduction 2. System Requirement 3. Installation + 3.1 Hardware installation + 3.2 Driver files + 3.3 Device naming convention + 3.4 Module driver configuration + 3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x. + 3.6 Custom configuration + 3.7 Verify driver installation 4. Utilities 5. Setserial 6. Troubleshooting @@ -16,27 +24,48 @@ Content ----------------------------------------------------------------------------- 1. Introduction - The Smartio family Linux driver, Ver. 1.1, supports following multiport + The Smartio/Industio/UPCI family Linux driver supports following multiport boards. - -C104P/H/HS, C104H/PCI, C104HS/PCI, CI-104J 4 port multiport board. - -C168P/H/HS, C168H/PCI 8 port multiport board. - - This driver has been modified a little and cleaned up from the Moxa - contributed driver code and merged into Linux 2.2.14pre. In particular - official major/minor numbers have been assigned which are different to - those the original Moxa supplied driver used. + - 2 ports multiport board + CP-102U, CP-102UL, CP-102UF + CP-132U-I, CP-132UL, + CP-132, CP-132I, CP132S, CP-132IS, + CI-132, CI-132I, CI-132IS, + (C102H, C102HI, C102HIS, C102P, CP-102, CP-102S) + + - 4 ports multiport board + CP-104EL, + CP-104UL, CP-104JU, + CP-134U, CP-134U-I, + C104H/PCI, C104HS/PCI, + CP-114, CP-114I, CP-114S, CP-114IS, CP-114UL, + C104H, C104HS, + CI-104J, CI-104JS, + CI-134, CI-134I, CI-134IS, + (C114HI, CT-114I, C104P) + POS-104UL, + CB-114, + CB-134I + + - 8 ports multiport board + CP-118EL, CP-168EL, + CP-118U, CP-168U, + C168H/PCI, + C168H, C168HS, + (C168P), + CB-108 This driver and installation procedure have been developed upon Linux Kernel - 2.2.5 and backward compatible to 2.0.3x. This driver supports Intel x86 and - Alpha hardware platform. In order to maintain compatibility, this version - has also been properly tested with RedHat, OpenLinux, TurboLinux and - S.u.S.E Linux. However, if compatibility problem occurs, please contact - Moxa at support@moxa.com.tw. + 2.4.x and 2.6.x. This driver supports Intel x86 hardware platform. In order + to maintain compatibility, this version has also been properly tested with + RedHat, Mandrake, Fedora and S.u.S.E Linux. However, if compatibility problem + occurs, please contact Moxa at support@moxa.com.tw. In addition to device driver, useful utilities are also provided in this version. They are - - msdiag Diagnostic program for detecting installed Moxa Smartio boards. + - msdiag Diagnostic program for displaying installed Moxa + Smartio/Industio boards. - msmon Monitor program to observe data count and line status signals. - msterm A simple terminal program which is useful in testing serial ports. @@ -47,8 +76,7 @@ Content GNU General Public License in this version. Please refer to GNU General Public License announcement in each source code file for more detail. - In Moxa's ftp sites, you may always find latest driver at - ftp://ftp.moxa.com or ftp://ftp.moxa.com.tw. + In Moxa's Web sites, you may always find latest driver at http://web.moxa.com. This version of driver can be installed as Loadable Module (Module driver) or built-in into kernel (Static driver). You may refer to following @@ -61,18 +89,27 @@ Content ----------------------------------------------------------------------------- 2. System Requirement - - Hardware platform: Intel x86 or Alpha machine - - Kernel version: 2.0.3x or 2.2.x + - Hardware platform: Intel x86 machine + - Kernel version: 2.4.x or 2.6.x - gcc version 2.72 or later - Maximum 4 boards can be installed in combination ----------------------------------------------------------------------------- 3. Installation + 3.1 Hardware installation + 3.2 Driver files + 3.3 Device naming convention + 3.4 Module driver configuration + 3.5 Static driver configuration for Linux kernel 2.4.x, 2.6.x. + 3.6 Custom configuration + 3.7 Verify driver installation + + 3.1 Hardware installation - There are two types of buses, ISA and PCI, for Smartio family multiport - board. + There are two types of buses, ISA and PCI, for Smartio/Industio + family multiport board. ISA board --------- @@ -81,47 +118,57 @@ Content installation procedure in User's Manual before proceed any further. Please make sure the JP1 is open after the ISA board is set properly. - PCI board - --------- + PCI/UPCI board + -------------- You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict with other ISA devices. Please refer to hardware installation procedure in User's Manual in advance. - IRQ Sharing + PCI IRQ Sharing ----------- Each port within the same multiport board shares the same IRQ. Up to - 4 Moxa Smartio Family multiport boards can be installed together on - one system and they can share the same IRQ. + 4 Moxa Smartio/Industio PCI Family multiport boards can be installed + together on one system and they can share the same IRQ. + - 3.2 Driver files and device naming convention + 3.2 Driver files The driver file may be obtained from ftp, CD-ROM or floppy disk. The first step, anyway, is to copy driver file "mxser.tgz" into specified directory. e.g. /moxa. The execute commands as below. + # cd / + # mkdir moxa # cd /moxa - # tar xvf /dev/fd0 + # tar xvf /dev/fd0 + or + + # cd / + # mkdir moxa # cd /moxa # cp /mnt/cdrom//mxser.tgz . # tar xvfz mxser.tgz + + 3.3 Device naming convention + You may find all the driver and utilities files in /moxa/mxser. Following installation procedure depends on the model you'd like to - run the driver. If you prefer module driver, please refer to 3.3. - If static driver is required, please refer to 3.4. + run the driver. If you prefer module driver, please refer to 3.4. + If static driver is required, please refer to 3.5. Dialin and callout port ----------------------- - This driver remains traditional serial device properties. There're + This driver remains traditional serial device properties. There are two special file name for each serial port. One is dial-in port which is named "ttyMxx". For callout port, the naming convention is "cumxx". Device naming when more than 2 boards installed ----------------------------------------------- - Naming convention for each Smartio multiport board is pre-defined - as below. + Naming convention for each Smartio/Industio multiport board is + pre-defined as below. Board Num. Dial-in Port Callout port 1st board ttyM0 - ttyM7 cum0 - cum7 @@ -129,6 +176,12 @@ Content 3rd board ttyM16 - ttyM23 cum16 - cum23 4th board ttyM24 - ttym31 cum24 - cum31 + + !!!!!!!!!!!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + Under Kernel 2.6 the cum Device is Obsolete. So use ttyM* + device instead. + !!!!!!!!!!!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + Board sequence -------------- This driver will activate ISA boards according to the parameter set @@ -138,69 +191,131 @@ Content For PCI boards, their sequence will be after ISA boards and C168H/PCI has higher priority than C104H/PCI boards. - 3.3 Module driver configuration + 3.4 Module driver configuration Module driver is easiest way to install. If you prefer static driver installation, please skip this paragraph. - 1. Find "Makefile" in /moxa/mxser, then run - # make install + + ------------- Prepare to use the MOXA driver-------------------- + 3.4.1 Create tty device with correct major number + Before using MOXA driver, your system must have the tty devices + which are created with driver's major number. We offer one shell + script "msmknod" to simplify the procedure. + This step is only needed to be executed once. But you still + need to do this procedure when: + a. You change the driver's major number. Please refer the "3.7" + section. + b. Your total installed MOXA boards number is changed. Maybe you + add/delete one MOXA board. + c. You want to change the tty name. This needs to modify the + shell script "msmknod" + + The procedure is: + # cd /moxa/mxser/driver + # ./msmknod + + This shell script will require the major number for dial-in + device and callout device to create tty device. You also need + to specify the total installed MOXA board number. Default major + numbers for dial-in device and callout device are 30, 35. If + you need to change to other number, please refer section "3.7" + for more detailed procedure. + Msmknod will delete any special files occupying the same device + naming. + + 3.4.2 Build the MOXA driver and utilities + Before using the MOXA driver and utilities, you need compile the + all the source code. This step is only need to be executed once. + But you still re-compile the source code if you modify the source + code. For example, if you change the driver's major number (see + "3.7" section), then you need to do this step again. + + Find "Makefile" in /moxa/mxser, then run + + # make clean; make install + + !!!!!!!!!! NOTE !!!!!!!!!!!!!!!!! + For Red Hat 9, Red Hat Enterprise Linux AS3/ES3/WS3 & Fedora Core1: + # make clean; make installsp1 + + For Red Hat Enterprise Linux AS4/ES4/WS4: + # make clean; make installsp2 + !!!!!!!!!! NOTE !!!!!!!!!!!!!!!!! The driver files "mxser.o" and utilities will be properly compiled - and copied to system directories respectively.Then run + and copied to system directories respectively. - # insmod mxser + ------------- Load MOXA driver-------------------- + 3.4.3 Load the MOXA driver - to activate the modular driver. You may run "lsmod" to check - if "mxser.o" is activated. + # modprobe mxser - 2. Create special files by executing "msmknod". - # cd /moxa/mxser/driver - # ./msmknod + will activate the module driver. You may run "lsmod" to check + if "mxser" is activated. If the MOXA board is ISA board, the + is needed. Please refer to section "3.4.5" for more + information. + + + ------------- Load MOXA driver on boot -------------------- + 3.4.4 For the above description, you may manually execute + "modprobe mxser" to activate this driver and run + "rmmod mxser" to remove it. + However, it's better to have a boot time configuration to + eliminate manual operation. Boot time configuration can be + achieved by rc file. We offer one "rc.mxser" file to simplify + the procedure under "moxa/mxser/driver". - Default major numbers for dial-in device and callout device are - 174, 175. Msmknod will delete any special files occupying the same - device naming. + But if you use ISA board, please modify the "modprobe ..." command + to add the argument (see "3.4.5" section). After modifying the + rc.mxser, please try to execute "/moxa/mxser/driver/rc.mxser" + manually to make sure the modification is ok. If any error + encountered, please try to modify again. If the modification is + completed, follow the below step. - 3. Up to now, you may manually execute "insmod mxser" to activate - this driver and run "rmmod mxser" to remove it. However, it's - better to have a boot time configuration to eliminate manual - operation. - Boot time configuration can be achieved by rc file. Run following - command for setting rc files. + Run following command for setting rc files. # cd /moxa/mxser/driver # cp ./rc.mxser /etc/rc.d # cd /etc/rc.d - You may have to modify part of the content in rc.mxser to specify - parameters for ISA board. Please refer to rc.mxser for more detail. - Find "rc.serial". If "rc.serial" doesn't exist, create it by vi. - Add "rc.mxser" in last line. Next, open rc.local by vi - and append following content. + Check "rc.serial" is existed or not. If "rc.serial" doesn't exist, + create it by vi, run "chmod 755 rc.serial" to change the permission. + Add "/etc/rc.d/rc.mxser" in last line, - if [ -f /etc/rc.d/rc.serial ]; then - sh /etc/rc.d/rc.serial - fi + Reboot and check if moxa.o activated by "lsmod" command. - 4. Reboot and check if mxser.o activated by "lsmod" command. - 5. If you'd like to drive Smartio ISA boards in the system, you'll - have to add parameter to specify CAP address of given board while - activating "mxser.o". The format for parameters are as follows. + 3.4.5. If you'd like to drive Smartio/Industio ISA boards in the system, + you'll have to add parameter to specify CAP address of given + board while activating "mxser.o". The format for parameters are + as follows. - insmod mxser ioaddr=0x???,0x???,0x???,0x??? + modprobe mxser ioaddr=0x???,0x???,0x???,0x??? | | | | | | | +- 4th ISA board | | +------ 3rd ISA board | +------------ 2nd ISA board +------------------- 1st ISA board - 3.4 Static driver configuration + 3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x + + Note: To use static driver, you must install the linux kernel + source package. + + 3.5.1 Backup the built-in driver in the kernel. + # cd /usr/src/linux/drivers/char + # mv mxser.c mxser.c.old + + For Red Hat 7.x user, you need to create link: + # cd /usr/src + # ln -s linux-2.4 linux - 1. Create link + 3.5.2 Create link # cd /usr/src/linux/drivers/char # ln -s /moxa/mxser/driver/mxser.c mxser.c - 2. Add CAP address list for ISA boards + 3.5.3 Add CAP address list for ISA boards. For PCI boards user, + please skip this step. + In module mode, the CAP address for ISA board is given by parameter. In static driver configuration, you'll have to assign it within driver's source code. If you will not @@ -222,73 +337,55 @@ Content static int mxserBoardCAP[] = {0x280, 0x180, 0x00, 0x00}; - 3. Modify tty_io.c - # cd /usr/src/linux/drivers/char/ - # vi tty_io.c - Find pty_init(), insert "mxser_init()" as + 3.5.4 Setup kernel configuration - pty_init(); - mxser_init(); + Configure the kernel: - 4. Modify tty.h - # cd /usr/src/linux/include/linux - # vi tty.h - Find extern int tty_init(void), insert "mxser_init()" as + # cd /usr/src/linux + # make menuconfig - extern int tty_init(void); - extern int mxser_init(void); - - 5. Modify Makefile - # cd /usr/src/linux/drivers/char - # vi Makefile - Find L_OBJS := tty_io.o ...... random.o, add - "mxser.o" at last of this line as - L_OBJS := tty_io.o ....... mxser.o + You will go into a menu-driven system. Please select [Character + devices][Non-standard serial port support], enable the [Moxa + SmartIO support] driver with "[*]" for built-in (not "[M]"), then + select [Exit] to exit this program. - 6. Rebuild kernel - The following are for Linux kernel rebuilding,for your reference only. + 3.5.5 Rebuild kernel + The following are for Linux kernel rebuilding, for your + reference only. For appropriate details, please refer to the Linux document. - If 'lilo' utility is installed, please use 'make zlilo' to rebuild - kernel. If 'lilo' is not installed, please follow the following steps. - a. cd /usr/src/linux - b. make clean /* take a few minutes */ - c. make bzImage /* take probably 10-20 minutes */ - d. Backup original boot kernel. /* optional step */ - e. cp /usr/src/linux/arch/i386/boot/bzImage /boot/vmlinuz + b. make clean /* take a few minutes */ + c. make dep /* take a few minutes */ + d. make bzImage /* take probably 10-20 minutes */ + e. make install /* copy boot image to correct position */ f. Please make sure the boot kernel (vmlinuz) is in the - correct position. If you use 'lilo' utility, you should - check /etc/lilo.conf 'image' item specified the path - which is the 'vmlinuz' path, or you will load wrong - (or old) boot kernel image (vmlinuz). - g. chmod 400 /vmlinuz - h. lilo - i. rdev -R /vmlinuz 1 - j. sync - - Note that if the result of "make zImage" is ERROR, then you have to - go back to Linux configuration Setup. Type "make config" in directory - /usr/src/linux or "setup". - - Since system include file, /usr/src/linux/include/linux/interrupt.h, - is modified each time the MOXA driver is installed, kernel rebuilding - is inevitable. And it takes about 10 to 20 minutes depends on the - machine. - - 7. Make utility - # cd /moxa/mxser/utility - # make install - - 8. Make special file + correct position. + g. If you use 'lilo' utility, you should check /etc/lilo.conf + 'image' item specified the path which is the 'vmlinuz' path, + or you will load wrong (or old) boot kernel image (vmlinuz). + After checking /etc/lilo.conf, please run "lilo". + + Note that if the result of "make bzImage" is ERROR, then you have to + go back to Linux configuration Setup. Type "make menuconfig" in + directory /usr/src/linux. + + + 3.5.6 Make tty device and special file # cd /moxa/mxser/driver # ./msmknod - 9. Reboot + 3.5.7 Make utility + # cd /moxa/mxser/utility + # make clean; make install + + 3.5.8 Reboot - 3.5 Custom configuration + + + 3.6 Custom configuration Although this driver already provides you default configuration, you - still can change the device name and major number.The instruction to + still can change the device name and major number. The instruction to change these parameters are shown as below. Change Device name @@ -306,33 +403,37 @@ Content 2 free major numbers for this driver. There are 3 steps to change major numbers. - 1. Find free major numbers + 3.6.1 Find free major numbers In /proc/devices, you may find all the major numbers occupied in the system. Please select 2 major numbers that are available. e.g. 40, 45. - 2. Create special files + 3.6.2 Create special files Run /moxa/mxser/driver/msmknod to create special files with specified major numbers. - 3. Modify driver with new major number + 3.6.3 Modify driver with new major number Run vi to open /moxa/mxser/driver/mxser.c. Locate the line contains "MXSERMAJOR". Change the content as below. #define MXSERMAJOR 40 #define MXSERCUMAJOR 45 - 4. Run # make install in /moxa/mxser/driver. + 3.6.4 Run "make clean; make install" in /moxa/mxser/driver. - 3.6 Verify driver installation + 3.7 Verify driver installation You may refer to /var/log/messages to check the latest status log reported by this driver whenever it's activated. + ----------------------------------------------------------------------------- 4. Utilities There are 3 utilities contained in this driver. They are msdiag, msmon and msterm. These 3 utilities are released in form of source code. They should be compiled into executable file and copied into /usr/bin. + Before using these utilities, please load driver (refer 3.4 & 3.5) and + make sure you had run the "msmknod" utility. + msdiag - Diagnostic -------------------- - This utility provides the function to detect what Moxa Smartio multiport - board exists in the system. + This utility provides the function to display what Moxa Smartio/Industio + board found by driver in the system. msmon - Port Monitoring ----------------------- @@ -353,12 +454,13 @@ Content application, for example, sending AT command to a modem connected to the port or used as a terminal for login purpose. Note that this is only a dumb terminal emulation without handling full screen operation. + ----------------------------------------------------------------------------- 5. Setserial Supported Setserial parameters are listed as below. - uart set UART type(16450-->disable FIFO, 16550A-->enable FIFO) + uart set UART type(16450-->disable FIFO, 16550A-->enable FIFO) close_delay set the amount of time(in 1/100 of a second) that DTR should be kept low while being closed. closing_wait set the amount of time(in 1/100 of a second) that the @@ -366,7 +468,13 @@ Content being closed, before the receiver is disable. spd_hi Use 57.6kb when the application requests 38.4kb. spd_vhi Use 115.2kb when the application requests 38.4kb. + spd_shi Use 230.4kb when the application requests 38.4kb. + spd_warp Use 460.8kb when the application requests 38.4kb. spd_normal Use 38.4kb when the application requests 38.4kb. + spd_cust Use the custom divisor to set the speed when the + application requests 38.4kb. + divisor This option set the custom divison. + baud_base This option set the base baud rate. ----------------------------------------------------------------------------- 6. Troubleshooting @@ -375,8 +483,9 @@ Content possible. If all the possible solutions fail, please contact our technical support team to get more help. - Error msg: More than 4 Moxa Smartio family boards found. Fifth board and - after are ignored. + + Error msg: More than 4 Moxa Smartio/Industio family boards found. Fifth board + and after are ignored. Solution: To avoid this problem, please unplug fifth and after board, because Moxa driver supports up to 4 boards. @@ -384,7 +493,7 @@ Content Error msg: Request_irq fail, IRQ(?) may be conflict with another device. Solution: Other PCI or ISA devices occupy the assigned IRQ. If you are not sure - which device causes the situation,please check /proc/interrupts to find + which device causes the situation, please check /proc/interrupts to find free IRQ and simply change another free IRQ for Moxa board. Error msg: Board #: C1xx Series(CAP=xxx) interrupt number invalid. @@ -397,15 +506,18 @@ Content Moxa ISA board needs an interrupt vector.Please refer to user's manual "Hardware Installation" chapter to set interrupt vector. - Error msg: Couldn't install MOXA Smartio family driver! + Error msg: Couldn't install MOXA Smartio/Industio family driver! Solution: Load Moxa driver fail, the major number may conflict with other devices. - Please refer to previous section 3.5 to change a free major number for + Please refer to previous section 3.7 to change a free major number for Moxa driver. - Error msg: Couldn't install MOXA Smartio family callout driver! + Error msg: Couldn't install MOXA Smartio/Industio family callout driver! Solution: Load Moxa callout driver fail, the callout device major number may - conflict with other devices. Please refer to previous section 3.5 to + conflict with other devices. Please refer to previous section 3.7 to change a free callout device major number for Moxa driver. + + ----------------------------------------------------------------------------- + diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index a0cda062bc33b6e1de3dd2bb60d5ff62bdf97cd4..688dfe1e6b70f75a36d84f30dda2234c53664be0 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -289,35 +289,73 @@ downdelay fail_over_mac Specifies whether active-backup mode should set all slaves to - the same MAC address (the traditional behavior), or, when - enabled, change the bond's MAC address when changing the - active interface (i.e., fail over the MAC address itself). - - Fail over MAC is useful for devices that cannot ever alter - their MAC address, or for devices that refuse incoming - broadcasts with their own source MAC (which interferes with - the ARP monitor). - - The down side of fail over MAC is that every device on the - network must be updated via gratuitous ARP, vs. just updating - a switch or set of switches (which often takes place for any - traffic, not just ARP traffic, if the switch snoops incoming - traffic to update its tables) for the traditional method. If - the gratuitous ARP is lost, communication may be disrupted. - - When fail over MAC is used in conjuction with the mii monitor, - devices which assert link up prior to being able to actually - transmit and receive are particularly susecptible to loss of - the gratuitous ARP, and an appropriate updelay setting may be - required. - - A value of 0 disables fail over MAC, and is the default. A - value of 1 enables fail over MAC. This option is enabled - automatically if the first slave added cannot change its MAC - address. This option may be modified via sysfs only when no - slaves are present in the bond. - - This option was added in bonding version 3.2.0. + the same MAC address at enslavement (the traditional + behavior), or, when enabled, perform special handling of the + bond's MAC address in accordance with the selected policy. + + Possible values are: + + none or 0 + + This setting disables fail_over_mac, and causes + bonding to set all slaves of an active-backup bond to + the same MAC address at enslavement time. This is the + default. + + active or 1 + + The "active" fail_over_mac policy indicates that the + MAC address of the bond should always be the MAC + address of the currently active slave. The MAC + address of the slaves is not changed; instead, the MAC + address of the bond changes during a failover. + + This policy is useful for devices that cannot ever + alter their MAC address, or for devices that refuse + incoming broadcasts with their own source MAC (which + interferes with the ARP monitor). + + The down side of this policy is that every device on + the network must be updated via gratuitous ARP, + vs. just updating a switch or set of switches (which + often takes place for any traffic, not just ARP + traffic, if the switch snoops incoming traffic to + update its tables) for the traditional method. If the + gratuitous ARP is lost, communication may be + disrupted. + + When this policy is used in conjuction with the mii + monitor, devices which assert link up prior to being + able to actually transmit and receive are particularly + susecptible to loss of the gratuitous ARP, and an + appropriate updelay setting may be required. + + follow or 2 + + The "follow" fail_over_mac policy causes the MAC + address of the bond to be selected normally (normally + the MAC address of the first slave added to the bond). + However, the second and subsequent slaves are not set + to this MAC address while they are in a backup role; a + slave is programmed with the bond's MAC address at + failover time (and the formerly active slave receives + the newly active slave's MAC address). + + This policy is useful for multiport devices that + either become confused or incur a performance penalty + when multiple ports are programmed with the same MAC + address. + + + The default policy is none, unless the first slave cannot + change its MAC address, in which case the active policy is + selected by default. + + This option may be modified via sysfs only when no slaves are + present in the bond. + + This option was added in bonding version 3.2.0. The "follow" + policy was added in bonding version 3.3.0. lacp_rate @@ -338,7 +376,8 @@ max_bonds Specifies the number of bonding devices to create for this instance of the bonding driver. E.g., if max_bonds is 3, and the bonding driver is not already loaded, then bond0, bond1 - and bond2 will be created. The default value is 1. + and bond2 will be created. The default value is 1. Specifying + a value of 0 will load bonding, but will not create any devices. miimon @@ -501,6 +540,17 @@ mode swapped with the new curr_active_slave that was chosen. +num_grat_arp + + Specifies the number of gratuitous ARPs to be issued after a + failover event. One gratuitous ARP is issued immediately after + the failover, subsequent ARPs are sent at a rate of one per link + monitor interval (arp_interval or miimon, whichever is active). + + The valid range is 0 - 255; the default value is 1. This option + affects only the active-backup mode. This option was added for + bonding version 3.3.0. + primary A string (eth0, eth2, etc) specifying which slave is the @@ -581,7 +631,7 @@ xmit_hash_policy in environments where a layer3 gateway device is required to reach most destinations. - This algorithm is 802.3ad complient. + This algorithm is 802.3ad compliant. layer3+4 diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 641d2afacffa33094a5ee7a52e6646ec7c089220..297ba7b1ccaf953ff85e486baa0a7547450db31c 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -186,7 +186,7 @@ solution for a couple of reasons: The Linux network devices (by default) just can handle the transmission and reception of media dependent frames. Due to the - arbritration on the CAN bus the transmission of a low prio CAN-ID + arbitration on the CAN bus the transmission of a low prio CAN-ID may be delayed by the reception of a high prio CAN frame. To reflect the correct* traffic on the node the loopback of the sent data has to be performed right after a successful transmission. If @@ -481,7 +481,7 @@ solution for a couple of reasons: - stats_timer: To calculate the Socket CAN core statistics (e.g. current/maximum frames per second) this 1 second timer is invoked at can.ko module start time by default. This timer can be - disabled by using stattimer=0 on the module comandline. + disabled by using stattimer=0 on the module commandline. - debug: (removed since SocketCAN SVN r546) diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/dm9000.txt new file mode 100644 index 0000000000000000000000000000000000000000..65df3dea55613bd581dd787305054b7062657507 --- /dev/null +++ b/Documentation/networking/dm9000.txt @@ -0,0 +1,167 @@ +DM9000 Network driver +===================== + +Copyright 2008 Simtec Electronics, + Ben Dooks + + +Introduction +------------ + +This file describes how to use the DM9000 platform-device based network driver +that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h. + +The driver supports three DM9000 variants, the DM9000E which is the first chip +supported as well as the newer DM9000A and DM9000B devices. It is currently +maintained and tested by Ben Dooks, who should be CC: to any patches for this +driver. + + +Defining the platform device +---------------------------- + +The minimum set of resources attached to the platform device are as follows: + + 1) The physical address of the address register + 2) The physical address of the data register + 3) The IRQ line the device's interrupt pin is connected to. + +These resources should be specified in that order, as the ordering of the +two address regions is important (the driver expects these to be address +and then data). + +An example from arch/arm/mach-s3c2410/mach-bast.c is: + +static struct resource bast_dm9k_resource[] = { + [0] = { + .start = S3C2410_CS5 + BAST_PA_DM9000, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 3, + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f, + .flags = IORESOURCE_MEM, + }, + [2] = { + .start = IRQ_DM9000, + .end = IRQ_DM9000, + .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL, + } +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, +}; + +Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags, +as this will generate a warning if it is not present. The trigger from +the flags field will be passed to request_irq() when registering the IRQ +handler to ensure that the IRQ is setup correctly. + +This shows a typical platform device, without the optional configuration +platform data supplied. The next example uses the same resources, but adds +the optional platform data to pass extra configuration data: + +static struct dm9000_plat_data bast_dm9k_platdata = { + .flags = DM9000_PLATF_16BITONLY, +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, + .dev = { + .platform_data = &bast_dm9k_platdata, + } +}; + +The platform data is defined in include/linux/dm9000.h and described below. + + +Platform data +------------- + +Extra platform data for the DM9000 can describe the IO bus width to the +device, whether or not an external PHY is attached to the device and +the availability of an external configuration EEPROM. + +The flags for the platform data .flags field are as follows: + +DM9000_PLATF_8BITONLY + + The IO should be done with 8bit operations. + +DM9000_PLATF_16BITONLY + + The IO should be done with 16bit operations. + +DM9000_PLATF_32BITONLY + + The IO should be done with 32bit operations. + +DM9000_PLATF_EXT_PHY + + The chip is connected to an external PHY. + +DM9000_PLATF_NO_EEPROM + + This can be used to signify that the board does not have an + EEPROM, or that the EEPROM should be hidden from the user. + +DM9000_PLATF_SIMPLE_PHY + + Switch to using the simpler PHY polling method which does not + try and read the MII PHY state regularly. This is only available + when using the internal PHY. See the section on link state polling + for more information. + + The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry + "Force simple NSR based PHY polling" allows this flag to be + forced on at build time. + + +PHY Link state polling +---------------------- + +The driver keeps track of the link state and informs the network core +about link (carrier) availablilty. This is managed by several methods +depending on the version of the chip and on which PHY is being used. + +For the internal PHY, the original (and currently default) method is +to read the MII state, either when the status changes if we have the +necessary interrupt support in the chip or every two seconds via a +periodic timer. + +To reduce the overhead for the internal PHY, there is now the option +of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY +platform data option to read the summary information without the +expensive MII accesses. This method is faster, but does not print +as much information. + +When using an external PHY, the driver currently has to poll the MII +link status as there is no method for getting an interrupt on link change. + + +DM9000A / DM9000B +----------------- + +These chips are functionally similar to the DM9000E and are supported easily +by the same driver. The features are: + + 1) Interrupt on internal PHY state change. This means that the periodic + polling of the PHY status may be disabled on these devices when using + the internal PHY. + + 2) TCP/UDP checksum offloading, which the driver does not currently support. + + +ethtool +------- + +The driver supports the ethtool interface for access to the driver +state information, the PHY state and the EEPROM. diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt index 61b171cf5313c9b94c3e4fe185a9ff9edf9b5cbc..2df71861e578b6ebcb2b94371d4a32fbdf7fca61 100644 --- a/Documentation/networking/e1000.txt +++ b/Documentation/networking/e1000.txt @@ -513,21 +513,11 @@ Additional Configurations Intel(R) PRO/1000 PT Dual Port Server Connection Intel(R) PRO/1000 PT Dual Port Server Adapter Intel(R) PRO/1000 PF Dual Port Server Adapter - Intel(R) PRO/1000 PT Quad Port Server Adapter + Intel(R) PRO/1000 PT Quad Port Server Adapter NAPI ---- - NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled - or disabled based on the configuration of the kernel. To override - the default, use the following compile-time flags. - - To enable NAPI, compile the driver module, passing in a configuration option: - - make CFLAGS_EXTRA=-DE1000_NAPI install - - To disable NAPI, compile the driver module, passing in a configuration option: - - make CFLAGS_EXTRA=-DE1000_NO_NAPI install + NAPI (Rx polling mode) is enabled in the e1000 driver. See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 946b66e1b65214f4becc5d2563f73180fe331837..d84932650fd3e926cc11d5fa0bcaee6cc8a8cd33 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -551,8 +551,9 @@ icmp_echo_ignore_broadcasts - BOOLEAN icmp_ratelimit - INTEGER Limit the maximal rates for sending ICMP packets whose type matches icmp_ratemask (see below) to specific targets. - 0 to disable any limiting, otherwise the maximal rate in jiffies(1) - Default: 100 + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 icmp_ratemask - INTEGER Mask made of ICMP types for which rates are being limited. @@ -1023,11 +1024,23 @@ max_addresses - INTEGER autoconfigured addresses. Default: 16 +disable_ipv6 - BOOLEAN + Disable IPv6 operation. + Default: FALSE (enable IPv6 operation) + +accept_dad - INTEGER + Whether to accept DAD (Duplicate Address Detection). + 0: Disable DAD + 1: Enable DAD (default) + 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate + link-local address has been found. + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. - 0 to disable any limiting, otherwise the maximal rate in jiffies(1) - Default: 100 + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 IPv6 Update by: diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt index 7c98277777ebcc14ee7c30991873cc6c92296d28..a0d0ffb5e584c1ae1b24a49552bfaef4c3b4684a 100644 --- a/Documentation/networking/ixgb.txt +++ b/Documentation/networking/ixgb.txt @@ -1,7 +1,7 @@ -Linux* Base Driver for the Intel(R) PRO/10GbE Family of Adapters -================================================================ +Linux Base Driver for 10 Gigabit Intel(R) Network Connection +============================================================= -November 17, 2004 +October 9, 2007 Contents @@ -9,94 +9,151 @@ Contents - In This Release - Identifying Your Adapter +- Building and Installation - Command Line Parameters - Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting - Support + In This Release =============== -This file describes the Linux* Base Driver for the Intel(R) PRO/10GbE Family -of Adapters, version 1.0.x. +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and ifconfig to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. -For questions related to hardware requirements, refer to the documentation -supplied with your Intel PRO/10GbE adapter. All hardware requirements listed -apply to use with Linux. Identifying Your Adapter ======================== -To verify your Intel adapter is supported, find the board ID number on the -adapter. Look for a label that has a barcode and a number in the format -A12345-001. +The following Intel network adapters are compatible with the drivers in this +release: + +Controller Adapter Name Physical Layer +---------- ------------ -------------- +82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) + Server Adapters 10G Base-SR (850 nm optical fiber) + 10G Base-CX4(twin-axial copper cabling) + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/network/sb/CS-012904.htm + + +Building and Installation +========================= + +select m for "Intel(R) PRO/10GbE support" located at: + Location: + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) +1. make modules && make modules_install + +2. Load the module: + +    modprobe ixgb = + + The insmod command can be used if the full + path to the driver module is specified. For example: + + insmod /lib/modules//kernel/drivers/net/ixgb/ixgb.ko + + With 2.6 based kernels also make sure that older ixgb drivers are + removed from the kernel, before loading the new module: -Use the above information and the Adapter & Driver ID Guide at: + rmmod ixgb; modprobe ixgb - http://support.intel.com/support/network/adapter/pro100/21397.htm +3. Assign an IP address to the interface by entering the following, where + x is the interface number: -For the latest Intel network drivers for Linux, go to: + ifconfig ethx + +4. Verify that the interface works. Enter the following, where + is the IP address for another machine on the same subnet as the interface + that is being tested: + + ping - http://downloadfinder.intel.com/scripts-df/support_intel.asp Command Line Parameters ======================= -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe or insmod command -using this syntax: +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax: modprobe ixgb [