Skip to content
Snippets Groups Projects
  1. Jan 05, 2011
  2. Nov 30, 2010
  3. Apr 08, 2010
  4. Feb 10, 2010
  5. Jan 19, 2010
  6. Sep 27, 2009
  7. Sep 24, 2009
  8. Sep 23, 2009
  9. Sep 22, 2009
    • Eric B Munson's avatar
      hugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal mount · 6bfde05b
      Eric B Munson authored
      
      This patchset adds a flag to mmap that allows the user to request that an
      anonymous mapping be backed with huge pages.  This mapping will borrow
      functionality from the huge page shm code to create a file on the kernel
      internal mount and use it to approximate an anonymous mapping.  The
      MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
      both flags being preset.
      
      A new flag is necessary because there is no other way to hook into huge
      pages without creating a file on a hugetlbfs mount which wouldn't be
      MAP_ANONYMOUS.
      
      To userspace, this mapping will behave just like an anonymous mapping
      because the file is not accessible outside of the kernel.
      
      This patchset is meant to simplify the programming model.  Presently there
      is a large chunk of boiler platecode, contained in libhugetlbfs, required
      to create private, hugepage backed mappings.  This patch set would allow
      use of hugepages without linking to libhugetlbfs or having hugetblfs
      mounted.
      
      Unification of the VM code would provide these same benefits, but it has
      been resisted each time that it has been suggested for several reasons: it
      would break PAGE_SIZE assumptions across the kernel, it makes page-table
      abstractions really expensive, and it does not provide any benefit on
      architectures that do not support huge pages, incurring fast path
      penalties without providing any benefit on these architectures.
      
      This patch:
      
      There are two means of creating mappings backed by huge pages:
      
              1. mmap() a file created on hugetlbfs
              2. Use shm which creates a file on an internal mount which essentially
                 maps it MAP_SHARED
      
      The internal mount is only used for shared mappings but there is very
      little that stops it being used for private mappings. This patch extends
      hugetlbfs_file_setup() to deal with the creation of files that will be
      mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
      used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.
      
      Signed-off-by: default avatarEric Munson <ebmunson@us.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6bfde05b
    • Alexey Dobriyan's avatar
  10. Sep 15, 2009
  11. Aug 24, 2009
    • Hugh Dickins's avatar
      mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30
      Hugh Dickins authored
      
      2.6.30's commit 8a0bdec1 removed
      user_shm_lock() calls in hugetlb_file_setup() but left the
      user_shm_unlock call in shm_destroy().
      
      In detail:
      Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
      is not called in hugetlb_file_setup(). However, user_shm_unlock() is
      called in any case in shm_destroy() and in the following
      atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
      up->__count gets zero, also cleanup_user_struct() is scheduled.
      
      Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
      However, the ref counter up->__count gets unexpectedly non-positive and
      the corresponding structs are freed even though there are live
      references to them, resulting in a kernel oops after a lots of
      shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
      
      Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
      time of shm_destroy() may give a different answer from at the time
      of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
      which has missed user_shm_unlock() ever since it came in 2.6.9.
      
      Reported-by: default avatarStefan Huber <shuber2@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Tested-by: default avatarStefan Huber <shuber2@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      353d5c30
  12. Jun 29, 2009
  13. Jun 21, 2009
  14. Jun 18, 2009
  15. Jun 10, 2009
  16. May 22, 2009
  17. Apr 15, 2009
  18. Apr 14, 2009
  19. Apr 07, 2009
    • Serge E. Hallyn's avatar
      namespaces: mqueue namespace: adapt sysctl · bdc8e5f8
      Serge E. Hallyn authored
      
      Largely inspired from ipc/ipc_sysctl.c.  This patch isolates the mqueue
      sysctl stuff in its own file.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarNadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bdc8e5f8
    • Serge E. Hallyn's avatar
      namespaces: ipc namespaces: implement support for posix msqueues · 7eafd7c7
      Serge E. Hallyn authored
      
      Implement multiple mounts of the mqueue file system, and link it to usage
      of CLONE_NEWIPC.
      
      Each ipc ns has a corresponding mqueuefs superblock.  When a user does
      clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
      internal mount of a new mqueuefs sb linked to the new ipc ns.
      
      When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
      mqueuefs superblock.
      
      Posix message queues can be worked with both through the mq_* system calls
      (see mq_overview(7)), and through the VFS through the mqueue mount.  Any
      usage of mq_open() and friends will work with the acting task's ipc
      namespace.  Any actions through the VFS will work with the mqueuefs in
      which the file was created.  So if a user doesn't remount mqueuefs after
      unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
      /dev/mqueue".
      
      If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
      ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
      ipc_ns:1 will be freed, (2) it's superblock will live on until task b
      umounts the corresponding mqueuefs, and vfs actions will continue to
      succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
      the deceased ipc_ns:1.
      
      To make this happen, we must protect the ipc reference count when
      
      a) a task exits and drops its ipcns->count, since it might be dropping
         it to 0 and freeing the ipcns
      
      b) a task accesses the ipcns through its mqueuefs interface, since it
         bumps the ipcns refcount and might race with the last task in the ipcns
         exiting.
      
      So the kref is changed to an atomic_t so we can use
      atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
      through ns = mqueuefs_sb->s_fs_info is protected by the same lock.
      
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7eafd7c7
    • Serge E. Hallyn's avatar
      namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace · 614b84cf
      Serge E. Hallyn authored
      
      Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
      The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
      posix message queue namespaces and the SYSV ipc namespaces.
      
      The sysctl code will be fixed separately in patch 3.  After just this
      patch, making a change to posix mqueue tunables always changes the values
      in the initial ipc namespace.
      
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      614b84cf
  20. Apr 03, 2009
  21. Apr 01, 2009
  22. Mar 16, 2009
    • Jonathan Corbet's avatar
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet authored
      
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  23. Feb 10, 2009
    • Mel Gorman's avatar
      Do not account for the address space used by hugetlbfs using VM_ACCOUNT · 5a6fe125
      Mel Gorman authored
      
      When overcommit is disabled, the core VM accounts for pages used by anonymous
      shared, private mappings and special mappings. It keeps track of VMAs that
      should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
      with VM_NORESERVE.
      
      Overcommit for hugetlbfs is much riskier than overcommit for base pages
      due to contiguity requirements. It avoids overcommiting on both shared and
      private mappings using reservation counters that are checked and updated
      during mmap(). This ensures (within limits) that hugepages exist in the
      future when faults occurs or it is too easy to applications to be SIGKILLed.
      
      As hugetlbfs makes its own reservations of a different unit to the base page
      size, VM_ACCOUNT should never be set. Even if the units were correct, we would
      double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
      be set because an application can request no reserves be made for hugetlbfs
      at the risk of getting killed later.
      
      With commit fc8744ad, VM_NORESERVE and
      VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
      breaks the accounting for both the core VM and hugetlbfs, can trigger an
      OOM storm when hugepage pools are too small lockups and corrupted counters
      otherwise are used. This patch brings hugetlbfs more in line with how the
      core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.
      
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a6fe125
  24. Feb 05, 2009
  25. Feb 01, 2009
    • Linus Torvalds's avatar
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds authored
      
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  26. Jan 14, 2009
Loading