summaryrefslogtreecommitdiff
path: root/Documentation/vm
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/vm')
-rw-r--r--Documentation/vm/balance14
-rw-r--r--Documentation/vm/page_migration27
-rw-r--r--Documentation/vm/slub.txt59
-rw-r--r--Documentation/vm/split_page_table_lock4
-rw-r--r--Documentation/vm/transhuge.txt10
-rw-r--r--Documentation/vm/unevictable-lru.txt120
6 files changed, 110 insertions, 124 deletions
diff --git a/Documentation/vm/balance b/Documentation/vm/balance
index c46e68cf9..964595481 100644
--- a/Documentation/vm/balance
+++ b/Documentation/vm/balance
@@ -1,12 +1,14 @@
Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
-Memory balancing is needed for non __GFP_WAIT as well as for non
-__GFP_IO allocations.
+Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+well as for non __GFP_IO allocations.
-There are two reasons to be requesting non __GFP_WAIT allocations:
-the caller can not sleep (typically intr context), or does not want
-to incur cost overheads of page stealing and possible swap io for
-whatever reasons.
+The first reason why a caller may avoid reclaim is that the caller can not
+sleep due to holding a spinlock or is in interrupt context. The second may
+be that the caller is willing to fail the allocation without incurring the
+overhead of page reclaim. This may happen for opportunistic high-order
+allocation requests that have order-0 fallback options. In such cases,
+the caller may also wish to avoid waking kswapd.
__GFP_IO allocation requests are made to prevent file system deadlocks.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 6513fe2d9..fea5c0864 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -92,29 +92,26 @@ Steps:
2. Insure that writeback is complete.
-3. Prep the new page that we want to move to. It is locked
- and set to not being uptodate so that all accesses to the new
- page immediately lock while the move is in progress.
+3. Lock the new page that we want to move to. It is locked so that accesses to
+ this (not yet uptodate) page immediately lock while the move is in progress.
-4. The new page is prepped with some settings from the old page so that
- accesses to the new page will discover a page with the correct settings.
-
-5. All the page table references to the page are converted
- to migration entries or dropped (nonlinear vmas).
- This decrease the mapcount of a page. If the resulting
- mapcount is not zero then we do not migrate the page.
- All user space processes that attempt to access the page
- will now wait on the page lock.
+4. All the page table references to the page are converted to migration
+ entries. This decreases the mapcount of a page. If the resulting
+ mapcount is not zero then we do not migrate the page. All user space
+ processes that attempt to access the page will now wait on the page lock.
-6. The radix tree lock is taken. This will cause all processes trying
+5. The radix tree lock is taken. This will cause all processes trying
to access the page via the mapping to block on the radix tree spinlock.
-7. The refcount of the page is examined and we back out if references remain
+6. The refcount of the page is examined and we back out if references remain
otherwise we know that we are the only one referencing this page.
-8. The radix tree is checked and if it does not contain the pointer to this
+7. The radix tree is checked and if it does not contain the pointer to this
page then we back out because someone else modified the radix tree.
+8. The new page is prepped with some settings from the old page so that
+ accesses to the new page will discover a page with the correct settings.
+
9. The radix tree is changed to point to the new page.
10. The reference count of the old page is dropped because the radix tree
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index b0c6d1bbb..699d8ea5c 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -280,4 +280,63 @@ of other objects.
slub_debug=FZ,dentry
+Extended slabinfo mode and plotting
+-----------------------------------
+
+The slabinfo tool has a special 'extended' ('-X') mode that includes:
+ - Slabcache Totals
+ - Slabs sorted by size (up to -N <num> slabs, default 1)
+ - Slabs sorted by loss (up to -N <num> slabs, default 1)
+
+Additionally, in this mode slabinfo does not dynamically scale sizes (G/M/K)
+and reports everything in bytes (this functionality is also available to
+other slabinfo modes via '-B' option) which makes reporting more precise and
+accurate. Moreover, in some sense the `-X' mode also simplifies the analysis
+of slabs' behaviour, because its output can be plotted using the
+slabinfo-gnuplot.sh script. So it pushes the analysis from looking through
+the numbers (tons of numbers) to something easier -- visual analysis.
+
+To generate plots:
+a) collect slabinfo extended records, for example:
+
+ while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
+
+b) pass stats file(-s) to slabinfo-gnuplot.sh script:
+ slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
+
+The slabinfo-gnuplot.sh script will pre-processes the collected records
+and generates 3 png files (and 3 pre-processing cache files) per STATS
+file:
+ - Slabcache Totals: FOO_STATS-totals.png
+ - Slabs sorted by size: FOO_STATS-slabs-by-size.png
+ - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
+
+Another use case, when slabinfo-gnuplot can be useful, is when you need
+to compare slabs' behaviour "prior to" and "after" some code modification.
+To help you out there, slabinfo-gnuplot.sh script can 'merge' the
+`Slabcache Totals` sections from different measurements. To visually
+compare N plots:
+
+a) Collect as many STATS1, STATS2, .. STATSN files as you need
+ while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
+
+b) Pre-process those STATS files
+ slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
+
+c) Execute slabinfo-gnuplot.sh in '-t' mode, passing all of the
+generated pre-processed *-totals
+ slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
+
+This will produce a single plot (png file).
+
+Plots, expectedly, can be large so some fluctuations or small spikes
+can go unnoticed. To deal with that, `slabinfo-gnuplot.sh' has two
+options to 'zoom-in'/'zoom-out':
+ a) -s %d,%d overwrites the default image width and heigh
+ b) -r %d,%d specifies a range of samples to use (for example,
+ in `slabinfo -X >> FOO_STATS; sleep 1;' case, using
+ a "-r 40,60" range will plot only samples collected
+ between 40th and 60th seconds).
+
Christoph Lameter, May 30, 2007
+Sergey Senozhatsky, October 23, 2015
diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock
index 6dea4fd5c..62842a857 100644
--- a/Documentation/vm/split_page_table_lock
+++ b/Documentation/vm/split_page_table_lock
@@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
which must be called on PTE table allocation / freeing.
Make sure the architecture doesn't use slab allocator for page table
-allocation: slab uses page->slab_cache and page->first_page for its pages.
-These fields share storage with page->ptl.
+allocation: slab uses page->slab_cache for its pages.
+This field shares storage with page->ptl.
PMD split lock only makes sense if you have more than two page table
levels.
diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 8143b9e83..8a282687e 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -170,6 +170,16 @@ A lower value leads to gain less thp performance. Value of
max_ptes_none can waste cpu time very little, you can
ignore it.
+max_ptes_swap specifies how many pages can be brought in from
+swap when collapsing a group of pages into a transparent huge page.
+
+/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
+
+A higher value can cause excessive swap IO and waste
+memory. A lower value can prevent THPs from being
+collapsed, resulting fewer pages being collapsed into
+THPs, and lower memory access performance.
+
== Boot parameter ==
You can change the sysfs boot time defaults of Transparent Hugepage
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt
index 32ee3a67d..fa3b52708 100644
--- a/Documentation/vm/unevictable-lru.txt
+++ b/Documentation/vm/unevictable-lru.txt
@@ -531,83 +531,20 @@ map.
try_to_unmap() is always called, by either vmscan for reclaim or for page
migration, with the argument page locked and isolated from the LRU. Separate
-functions handle anonymous and mapped file pages, as these types of pages have
-different reverse map mechanisms.
-
- (*) try_to_unmap_anon()
-
- To unmap anonymous pages, each VMA in the list anchored in the anon_vma
- must be visited - at least until a VM_LOCKED VMA is encountered. If the
- page is being unmapped for migration, VM_LOCKED VMAs do not stop the
- process because mlocked pages are migratable. However, for reclaim, if
- the page is mapped into a VM_LOCKED VMA, the scan stops.
-
- try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of
- the mm_struct to which the VMA belongs. If this is successful, it will
- mlock the page via mlock_vma_page() - we wouldn't have gotten to
- try_to_unmap_anon() if the page were already mlocked - and will return
- SWAP_MLOCK, indicating that the page is unevictable.
-
- If the mmap semaphore cannot be acquired, we are not sure whether the page
- is really unevictable or not. In this case, try_to_unmap_anon() will
- return SWAP_AGAIN.
-
- (*) try_to_unmap_file() - linear mappings
-
- Unmapping of a mapped file page works the same as for anonymous mappings,
- except that the scan visits all VMAs that map the page's index/page offset
- in the page's mapping's reverse map priority search tree. It also visits
- each VMA in the page's mapping's non-linear list, if the list is
- non-empty.
-
- As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file
- page, try_to_unmap_file() will attempt to acquire the associated
- mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this
- is successful, and SWAP_AGAIN, if not.
-
- (*) try_to_unmap_file() - non-linear mappings
-
- If a page's mapping contains a non-empty non-linear mapping VMA list, then
- try_to_un{map|lock}() must also visit each VMA in that list to determine
- whether the page is mapped in a VM_LOCKED VMA. Again, the scan must visit
- all VMAs in the non-linear list to ensure that the pages is not/should not
- be mlocked.
-
- If a VM_LOCKED VMA is found in the list, the scan could terminate.
- However, there is no easy way to determine whether the page is actually
- mapped in a given VMA - either for unmapping or testing whether the
- VM_LOCKED VMA actually pins the page.
-
- try_to_unmap_file() handles non-linear mappings by scanning a certain
- number of pages - a "cluster" - in each non-linear VMA associated with the
- page's mapping, for each file mapped page that vmscan tries to unmap. If
- this happens to unmap the page we're trying to unmap, try_to_unmap() will
- notice this on return (page_mapcount(page) will be 0) and return
- SWAP_SUCCESS. Otherwise, it will return SWAP_AGAIN, causing vmscan to
- recirculate this page. We take advantage of the cluster scan in
- try_to_unmap_cluster() as follows:
-
- For each non-linear VMA, try_to_unmap_cluster() attempts to acquire the
- mmap semaphore of the associated mm_struct for read without blocking.
-
- If this attempt is successful and the VMA is VM_LOCKED,
- try_to_unmap_cluster() will retain the mmap semaphore for the scan;
- otherwise it drops it here.
-
- Then, for each page in the cluster, if we're holding the mmap semaphore
- for a locked VMA, try_to_unmap_cluster() calls mlock_vma_page() to
- mlock the page. This call is a no-op if the page is already locked,
- but will mlock any pages in the non-linear mapping that happen to be
- unlocked.
-
- If one of the pages so mlocked is the page passed in to try_to_unmap(),
- try_to_unmap_cluster() will return SWAP_MLOCK, rather than the default
- SWAP_AGAIN. This will allow vmscan to cull the page, rather than
- recirculating it on the inactive list.
-
- Again, if try_to_unmap_cluster() cannot acquire the VMA's mmap sem, it
- returns SWAP_AGAIN, indicating that the page is mapped by a VM_LOCKED
- VMA, but couldn't be mlocked.
+functions handle anonymous and mapped file and KSM pages, as these types of
+pages have different reverse map lookup mechanisms, with different locking.
+In each case, whether rmap_walk_anon() or rmap_walk_file() or rmap_walk_ksm(),
+it will call try_to_unmap_one() for every VMA which might contain the page.
+
+When trying to reclaim, if try_to_unmap_one() finds the page in a VM_LOCKED
+VMA, it will then mlock the page via mlock_vma_page() instead of unmapping it,
+and return SWAP_MLOCK to indicate that the page is unevictable: and the scan
+stops there.
+
+mlock_vma_page() is called while holding the page table's lock (in addition
+to the page lock, and the rmap lock): to serialize against concurrent mlock or
+munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
+holepunching, and truncation of file pages and their anonymous COWed pages.
try_to_munlock() REVERSE MAP SCAN
@@ -623,29 +560,15 @@ all PTEs from the page. For this purpose, the unevictable/mlock infrastructure
introduced a variant of try_to_unmap() called try_to_munlock().
try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
-mapped file pages with an additional argument specifying unlock versus unmap
+mapped file and KSM pages with a flag argument specifying unlock versus unmap
processing. Again, these functions walk the respective reverse maps looking
-for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file
-pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
-attempt to acquire the associated mmap semaphore, mlock the page via
-mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the
-pre-clearing of the page's PG_mlocked done by munlock_vma_page.
-
-If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap
-semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list() to
-recycle the page on the inactive list and hope that it has better luck with the
-page next time.
-
-For file pages mapped into non-linear VMAs, the try_to_munlock() logic works
-slightly differently. On encountering a VM_LOCKED non-linear VMA that might
-map the page, try_to_munlock() returns SWAP_AGAIN without actually mlocking the
-page. munlock_vma_page() will just leave the page unlocked and let vmscan deal
-with it - the usual fallback position.
+for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case,
+the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This
+undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
-However, the scan can terminate when it encounters a VM_LOCKED VMA and can
-successfully acquire the VMA's mmap semaphore for read and mlock the page.
+However, the scan can terminate when it encounters a VM_LOCKED VMA.
Although try_to_munlock() might be called a great many times when munlocking a
large region or tearing down a large address space that has been mlocked via
mlockall(), overall this is a fairly rare event.
@@ -673,11 +596,6 @@ Some examples of these unevictable pages on the LRU lists are:
(3) mlocked pages that could not be isolated from the LRU and moved to the
unevictable list in mlock_vma_page().
- (4) Pages mapped into multiple VM_LOCKED VMAs, but try_to_munlock() couldn't
- acquire the VMA's mmap semaphore to test the flags and set PageMlocked.
- munlock_vma_page() was forced to let the page back on to the normal LRU
- list for vmscan to handle.
-
shrink_inactive_list() also diverts any unevictable pages that it finds on the
inactive lists to the appropriate zone's unevictable list.