summaryrefslogtreecommitdiff
path: root/Documentation/cpu-freq
diff options
context:
space:
mode:
authorAndré Fabian Silva Delgado <emulatorman@parabola.nu>2015-08-05 17:04:01 -0300
committerAndré Fabian Silva Delgado <emulatorman@parabola.nu>2015-08-05 17:04:01 -0300
commit57f0f512b273f60d52568b8c6b77e17f5636edc0 (patch)
tree5e910f0e82173f4ef4f51111366a3f1299037a7b /Documentation/cpu-freq
Initial import
Diffstat (limited to 'Documentation/cpu-freq')
-rw-r--r--Documentation/cpu-freq/amd-powernow.txt38
-rw-r--r--Documentation/cpu-freq/boost.txt93
-rw-r--r--Documentation/cpu-freq/core.txt123
-rw-r--r--Documentation/cpu-freq/cpu-drivers.txt274
-rw-r--r--Documentation/cpu-freq/cpufreq-nforce2.txt19
-rw-r--r--Documentation/cpu-freq/cpufreq-stats.txt128
-rw-r--r--Documentation/cpu-freq/governors.txt269
-rw-r--r--Documentation/cpu-freq/index.txt54
-rw-r--r--Documentation/cpu-freq/intel-pstate.txt64
-rw-r--r--Documentation/cpu-freq/pcc-cpufreq.txt207
-rw-r--r--Documentation/cpu-freq/user-guide.txt224
11 files changed, 1493 insertions, 0 deletions
diff --git a/Documentation/cpu-freq/amd-powernow.txt b/Documentation/cpu-freq/amd-powernow.txt
new file mode 100644
index 000000000..254da155f
--- /dev/null
+++ b/Documentation/cpu-freq/amd-powernow.txt
@@ -0,0 +1,38 @@
+
+PowerNow! and Cool'n'Quiet are AMD names for frequency
+management capabilities in AMD processors. As the hardware
+implementation changes in new generations of the processors,
+there is a different cpu-freq driver for each generation.
+
+Note that the driver's will not load on the "wrong" hardware,
+so it is safe to try each driver in turn when in doubt as to
+which is the correct driver.
+
+Note that the functionality to change frequency (and voltage)
+is not available in all processors. The drivers will refuse
+to load on processors without this capability. The capability
+is detected with the cpuid instruction.
+
+The drivers use BIOS supplied tables to obtain frequency and
+voltage information appropriate for a particular platform.
+Frequency transitions will be unavailable if the BIOS does
+not supply these tables.
+
+6th Generation: powernow-k6
+
+7th Generation: powernow-k7: Athlon, Duron, Geode.
+
+8th Generation: powernow-k8: Athlon, Athlon 64, Opteron, Sempron.
+Documentation on this functionality in 8th generation processors
+is available in the "BIOS and Kernel Developer's Guide", publication
+26094, in chapter 9, available for download from www.amd.com.
+
+BIOS supplied data, for powernow-k7 and for powernow-k8, may be
+from either the PSB table or from ACPI objects. The ACPI support
+is only available if the kernel config sets CONFIG_ACPI_PROCESSOR.
+The powernow-k8 driver will attempt to use ACPI if so configured,
+and fall back to PST if that fails.
+The powernow-k7 driver will try to use the PSB support first, and
+fall back to ACPI if the PSB support fails. A module parameter,
+acpi_force, is provided to force ACPI support to be used instead
+of PSB support.
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt
new file mode 100644
index 000000000..dd62e1334
--- /dev/null
+++ b/Documentation/cpu-freq/boost.txt
@@ -0,0 +1,93 @@
+Processor boosting control
+
+ - information for users -
+
+Quick guide for the impatient:
+--------------------
+/sys/devices/system/cpu/cpufreq/boost
+controls the boost setting for the whole system. You can read and write
+that file with either "0" (boosting disabled) or "1" (boosting allowed).
+Reading or writing 1 does not mean that the system is boosting at this
+very moment, but only that the CPU _may_ raise the frequency at it's
+discretion.
+--------------------
+
+Introduction
+-------------
+Some CPUs support a functionality to raise the operating frequency of
+some cores in a multi-core package if certain conditions apply, mostly
+if the whole chip is not fully utilized and below it's intended thermal
+budget. The decision about boost disable/enable is made either at hardware
+(e.g. x86) or software (e.g ARM).
+On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
+in technical documentation "Core performance boost". In Linux we use
+the term "boost" for convenience.
+
+Rationale for disable switch
+----------------------------
+
+Though the idea is to just give better performance without any user
+intervention, sometimes the need arises to disable this functionality.
+Most systems offer a switch in the (BIOS) firmware to disable the
+functionality at all, but a more fine-grained and dynamic control would
+be desirable:
+1. While running benchmarks, reproducible results are important. Since
+ the boosting functionality depends on the load of the whole package,
+ single thread performance can vary. By explicitly disabling the boost
+ functionality at least for the benchmark's run-time the system will run
+ at a fixed frequency and results are reproducible again.
+2. To examine the impact of the boosting functionality it is helpful
+ to do tests with and without boosting.
+3. Boosting means overclocking the processor, though under controlled
+ conditions. By raising the frequency and the voltage the processor
+ will consume more power than without the boosting, which may be
+ undesirable for instance for mobile users. Disabling boosting may
+ save power here, though this depends on the workload.
+
+
+User controlled switch
+----------------------
+
+To allow the user to toggle the boosting functionality, the cpufreq core
+driver exports a sysfs knob to enable or disable it. There is a file:
+/sys/devices/system/cpu/cpufreq/boost
+which can either read "0" (boosting disabled) or "1" (boosting enabled).
+The file is exported only when cpufreq driver supports boosting.
+Explicitly changing the permissions and writing to that file anyway will
+return EINVAL.
+
+On supported CPUs one can write either a "0" or a "1" into this file.
+This will either disable the boost functionality on all cores in the
+whole system (0) or will allow the software or hardware to boost at will
+(1).
+
+Writing a "1" does not explicitly boost the system, but just allows the
+CPU to boost at their discretion. Some implementations take external
+factors like the chip's temperature into account, so boosting once does
+not necessarily mean that it will occur every time even using the exact
+same software setup.
+
+
+AMD legacy cpb switch
+---------------------
+The AMD powernow-k8 driver used to support a very similar switch to
+disable or enable the "Core Performance Boost" feature of some AMD CPUs.
+This switch was instantiated in each CPU's cpufreq directory
+(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
+Though the per CPU existence hints at a more fine grained control, the
+actual implementation only supported a system-global switch semantics,
+which was simply reflected into each CPU's file. Writing a 0 or 1 into it
+would pull the other CPUs to the same state.
+For compatibility reasons this file and its behavior is still supported
+on AMD CPUs, though it is now protected by a config switch
+(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
+even with the config option set.
+This functionality is considered legacy and will be removed in some future
+kernel version.
+
+More fine grained boosting control
+----------------------------------
+
+Technically it is possible to switch the boosting functionality at least
+on a per package basis, for some CPUs even per core. Currently the driver
+does not support it, but this may be implemented in the future.
diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt
new file mode 100644
index 000000000..70933eadc
--- /dev/null
+++ b/Documentation/cpu-freq/core.txt
@@ -0,0 +1,123 @@
+ CPU frequency and voltage scaling code in the Linux(TM) kernel
+
+
+ L i n u x C P U F r e q
+
+ C P U F r e q C o r e
+
+
+ Dominik Brodowski <linux@brodo.de>
+ David Kimdon <dwhedon@debian.org>
+
+
+
+ Clock scaling allows you to change the clock speed of the CPUs on the
+ fly. This is a nice method to save battery power, because the lower
+ the clock speed, the less power the CPU consumes.
+
+
+Contents:
+---------
+1. CPUFreq core and interfaces
+2. CPUFreq notifiers
+3. CPUFreq Table Generation with Operating Performance Point (OPP)
+
+1. General Information
+=======================
+
+The CPUFreq core code is located in drivers/cpufreq/cpufreq.c. This
+cpufreq code offers a standardized interface for the CPUFreq
+architecture drivers (those pieces of code that do actual
+frequency transitions), as well as to "notifiers". These are device
+drivers or other part of the kernel that need to be informed of
+policy changes (ex. thermal modules like ACPI) or of all
+frequency changes (ex. timing code) or even need to force certain
+speed limits (like LCD drivers on ARM architecture). Additionally, the
+kernel "constant" loops_per_jiffy is updated on frequency changes
+here.
+
+Reference counting is done by cpufreq_get_cpu and cpufreq_put_cpu,
+which make sure that the cpufreq processor driver is correctly
+registered with the core, and will not be unloaded until
+cpufreq_put_cpu is called.
+
+2. CPUFreq notifiers
+====================
+
+CPUFreq notifiers conform to the standard kernel notifier interface.
+See linux/include/linux/notifier.h for details on notifiers.
+
+There are two different CPUFreq notifiers - policy notifiers and
+transition notifiers.
+
+
+2.1 CPUFreq policy notifiers
+----------------------------
+
+These are notified when a new policy is intended to be set. Each
+CPUFreq policy notifier is called three times for a policy transition:
+
+1.) During CPUFREQ_ADJUST all CPUFreq notifiers may change the limit if
+ they see a need for this - may it be thermal considerations or
+ hardware limitations.
+
+2.) During CPUFREQ_INCOMPATIBLE only changes may be done in order to avoid
+ hardware failure.
+
+3.) And during CPUFREQ_NOTIFY all notifiers are informed of the new policy
+ - if two hardware drivers failed to agree on a new policy before this
+ stage, the incompatible hardware shall be shut down, and the user
+ informed of this.
+
+The phase is specified in the second argument to the notifier.
+
+The third argument, a void *pointer, points to a struct cpufreq_policy
+consisting of five values: cpu, min, max, policy and max_cpu_freq. min
+and max are the lower and upper frequencies (in kHz) of the new
+policy, policy the new policy, cpu the number of the affected CPU; and
+max_cpu_freq the maximum supported CPU frequency. This value is given
+for informational purposes only.
+
+
+2.2 CPUFreq transition notifiers
+--------------------------------
+
+These are notified twice when the CPUfreq driver switches the CPU core
+frequency and this change has any external implications.
+
+The second argument specifies the phase - CPUFREQ_PRECHANGE or
+CPUFREQ_POSTCHANGE.
+
+The third argument is a struct cpufreq_freqs with the following
+values:
+cpu - number of the affected CPU
+old - old frequency
+new - new frequency
+
+3. CPUFreq Table Generation with Operating Performance Point (OPP)
+==================================================================
+For details about OPP, see Documentation/power/opp.txt
+
+dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with
+ cpufreq_frequency_table_cpuinfo which is provided with the list of
+ frequencies that are available for operation. This function provides
+ a ready to use conversion routine to translate the OPP layer's internal
+ information about the available frequencies into a format readily
+ providable to cpufreq.
+
+ WARNING: Do not use this function in interrupt context.
+
+ Example:
+ soc_pm_init()
+ {
+ /* Do things */
+ r = dev_pm_opp_init_cpufreq_table(dev, &freq_table);
+ if (!r)
+ cpufreq_frequency_table_cpuinfo(policy, freq_table);
+ /* Do other things */
+ }
+
+ NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
+ addition to CONFIG_PM_OPP.
+
+dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
new file mode 100644
index 000000000..14f4e6336
--- /dev/null
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -0,0 +1,274 @@
+ CPU frequency and voltage scaling code in the Linux(TM) kernel
+
+
+ L i n u x C P U F r e q
+
+ C P U D r i v e r s
+
+ - information for developers -
+
+
+ Dominik Brodowski <linux@brodo.de>
+
+
+
+ Clock scaling allows you to change the clock speed of the CPUs on the
+ fly. This is a nice method to save battery power, because the lower
+ the clock speed, the less power the CPU consumes.
+
+
+Contents:
+---------
+1. What To Do?
+1.1 Initialization
+1.2 Per-CPU Initialization
+1.3 verify
+1.4 target/target_index or setpolicy?
+1.5 target/target_index
+1.6 setpolicy
+1.7 get_intermediate and target_intermediate
+2. Frequency Table Helpers
+
+
+
+1. What To Do?
+==============
+
+So, you just got a brand-new CPU / chipset with datasheets and want to
+add cpufreq support for this CPU / chipset? Great. Here are some hints
+on what is necessary:
+
+
+1.1 Initialization
+------------------
+
+First of all, in an __initcall level 7 (module_init()) or later
+function check whether this kernel runs on the right CPU and the right
+chipset. If so, register a struct cpufreq_driver with the CPUfreq core
+using cpufreq_register_driver()
+
+What shall this struct cpufreq_driver contain?
+
+cpufreq_driver.name - The name of this driver.
+
+cpufreq_driver.init - A pointer to the per-CPU initialization
+ function.
+
+cpufreq_driver.verify - A pointer to a "verification" function.
+
+cpufreq_driver.setpolicy _or_
+cpufreq_driver.target/
+target_index - See below on the differences.
+
+And optionally
+
+cpufreq_driver.exit - A pointer to a per-CPU cleanup
+ function called during CPU_POST_DEAD
+ phase of cpu hotplug process.
+
+cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function
+ called during CPU_DOWN_PREPARE phase of
+ cpu hotplug process.
+
+cpufreq_driver.resume - A pointer to a per-CPU resume function
+ which is called with interrupts disabled
+ and _before_ the pre-suspend frequency
+ and/or policy is restored by a call to
+ ->target/target_index or ->setpolicy.
+
+cpufreq_driver.attr - A pointer to a NULL-terminated list of
+ "struct freq_attr" which allow to
+ export values to sysfs.
+
+cpufreq_driver.get_intermediate
+and target_intermediate Used to switch to stable frequency while
+ changing CPU frequency.
+
+
+1.2 Per-CPU Initialization
+--------------------------
+
+Whenever a new CPU is registered with the device model, or after the
+cpufreq driver registers itself, the per-CPU initialization function
+cpufreq_driver.init is called. It takes a struct cpufreq_policy
+*policy as argument. What to do now?
+
+If necessary, activate the CPUfreq support on your CPU.
+
+Then, the driver must fill in the following values:
+
+policy->cpuinfo.min_freq _and_
+policy->cpuinfo.max_freq - the minimum and maximum frequency
+ (in kHz) which is supported by
+ this CPU
+policy->cpuinfo.transition_latency the time it takes on this CPU to
+ switch between two frequencies in
+ nanoseconds (if appropriate, else
+ specify CPUFREQ_ETERNAL)
+
+policy->cur The current operating frequency of
+ this CPU (if appropriate)
+policy->min,
+policy->max,
+policy->policy and, if necessary,
+policy->governor must contain the "default policy" for
+ this CPU. A few moments later,
+ cpufreq_driver.verify and either
+ cpufreq_driver.setpolicy or
+ cpufreq_driver.target/target_index is called
+ with these values.
+
+For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the
+frequency table helpers might be helpful. See the section 2 for more information
+on them.
+
+SMP systems normally have same clock source for a group of cpus. For these the
+.init() would be called only once for the first online cpu. Here the .init()
+routine must initialize policy->cpus with mask of all possible cpus (Online +
+Offline) that share the clock. Then the core would copy this mask onto
+policy->related_cpus and will reset policy->cpus to carry only online cpus.
+
+
+1.3 verify
+------------
+
+When the user decides a new policy (consisting of
+"policy,governor,min,max") shall be set, this policy must be validated
+so that incompatible values can be corrected. For verifying these
+values, a frequency table helper and/or the
+cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned
+int min_freq, unsigned int max_freq) function might be helpful. See
+section 2 for details on frequency table helpers.
+
+You need to make sure that at least one valid frequency (or operating
+range) is within policy->min and policy->max. If necessary, increase
+policy->max first, and only if this is no solution, decrease policy->min.
+
+
+1.4 target/target_index or setpolicy?
+----------------------------
+
+Most cpufreq drivers or even most cpu frequency scaling algorithms
+only allow the CPU to be set to one frequency. For these, you use the
+->target/target_index call.
+
+Some cpufreq-capable processors switch the frequency between certain
+limits on their own. These shall use the ->setpolicy call
+
+
+1.5. target/target_index
+-------------
+
+The target_index call has two arguments: struct cpufreq_policy *policy,
+and unsigned int index (into the exposed frequency table).
+
+The CPUfreq driver must set the new frequency when called here. The
+actual frequency must be determined by freq_table[index].frequency.
+
+It should always restore to earlier frequency (i.e. policy->restore_freq) in
+case of errors, even if we switched to intermediate frequency earlier.
+
+Deprecated:
+----------
+The target call has three arguments: struct cpufreq_policy *policy,
+unsigned int target_frequency, unsigned int relation.
+
+The CPUfreq driver must set the new frequency when called here. The
+actual frequency must be determined using the following rules:
+
+- keep close to "target_freq"
+- policy->min <= new_freq <= policy->max (THIS MUST BE VALID!!!)
+- if relation==CPUFREQ_REL_L, try to select a new_freq higher than or equal
+ target_freq. ("L for lowest, but no lower than")
+- if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal
+ target_freq. ("H for highest, but no higher than")
+
+Here again the frequency table helper might assist you - see section 2
+for details.
+
+
+1.6 setpolicy
+---------------
+
+The setpolicy call only takes a struct cpufreq_policy *policy as
+argument. You need to set the lower limit of the in-processor or
+in-chipset dynamic frequency switching to policy->min, the upper limit
+to policy->max, and -if supported- select a performance-oriented
+setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a
+powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check
+the reference implementation in drivers/cpufreq/longrun.c
+
+1.7 get_intermediate and target_intermediate
+--------------------------------------------
+
+Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
+
+get_intermediate should return a stable intermediate frequency platform wants to
+switch to, and target_intermediate() should set CPU to to that frequency, before
+jumping to the frequency corresponding to 'index'. Core will take care of
+sending notifications and driver doesn't have to handle them in
+target_intermediate() or target_index().
+
+Drivers can return '0' from get_intermediate() in case they don't wish to switch
+to intermediate frequency for some target frequency. In that case core will
+directly call ->target_index().
+
+NOTE: ->target_index() should restore to policy->restore_freq in case of
+failures as core would send notifications for that.
+
+
+2. Frequency Table Helpers
+==========================
+
+As most cpufreq processors only allow for being set to a few specific
+frequencies, a "frequency table" with some functions might assist in
+some work of the processor driver. Such a "frequency table" consists
+of an array of struct cpufreq_frequency_table entries, with any value in
+"driver_data" you want to use, and the corresponding frequency in
+"frequency". At the end of the table, you need to add a
+cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. And
+if you want to skip one entry in the table, set the frequency to
+CPUFREQ_ENTRY_INVALID. The entries don't need to be in ascending
+order.
+
+By calling cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
+ struct cpufreq_frequency_table *table);
+the cpuinfo.min_freq and cpuinfo.max_freq values are detected, and
+policy->min and policy->max are set to the same values. This is
+helpful for the per-CPU initialization stage.
+
+int cpufreq_frequency_table_verify(struct cpufreq_policy *policy,
+ struct cpufreq_frequency_table *table);
+assures that at least one valid frequency is within policy->min and
+policy->max, and all other criteria are met. This is helpful for the
+->verify call.
+
+int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
+ struct cpufreq_frequency_table *table,
+ unsigned int target_freq,
+ unsigned int relation,
+ unsigned int *index);
+
+is the corresponding frequency table helper for the ->target
+stage. Just pass the values to this function, and the unsigned int
+index returns the number of the frequency table entry which contains
+the frequency the CPU shall be set to.
+
+The following macros can be used as iterators over cpufreq_frequency_table:
+
+cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency
+table.
+
+cpufreq-for_each_valid_entry(pos, table) - iterates over all entries,
+excluding CPUFREQ_ENTRY_INVALID frequencies.
+Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and
+"table" - the cpufreq_frequency_table * you want to iterate over.
+
+For example:
+
+ struct cpufreq_frequency_table *pos, *driver_freq_table;
+
+ cpufreq_for_each_entry(pos, driver_freq_table) {
+ /* Do something with pos */
+ pos->frequency = ...
+ }
diff --git a/Documentation/cpu-freq/cpufreq-nforce2.txt b/Documentation/cpu-freq/cpufreq-nforce2.txt
new file mode 100644
index 000000000..babce1315
--- /dev/null
+++ b/Documentation/cpu-freq/cpufreq-nforce2.txt
@@ -0,0 +1,19 @@
+
+The cpufreq-nforce2 driver changes the FSB on nVidia nForce2 platforms.
+
+This works better than on other platforms, because the FSB of the CPU
+can be controlled independently from the PCI/AGP clock.
+
+The module has two options:
+
+ fid: multiplier * 10 (for example 8.5 = 85)
+ min_fsb: minimum FSB
+
+If not set, fid is calculated from the current CPU speed and the FSB.
+min_fsb defaults to FSB at boot time - 50 MHz.
+
+IMPORTANT: The available range is limited downwards!
+ Also the minimum available FSB can differ, for systems
+ booting with 200 MHz, 150 should always work.
+
+
diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt
new file mode 100644
index 000000000..fc647492e
--- /dev/null
+++ b/Documentation/cpu-freq/cpufreq-stats.txt
@@ -0,0 +1,128 @@
+
+ CPU frequency and voltage scaling statistics in the Linux(TM) kernel
+
+
+ L i n u x c p u f r e q - s t a t s d r i v e r
+
+ - information for users -
+
+
+ Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
+
+Contents
+1. Introduction
+2. Statistics Provided (with example)
+3. Configuring cpufreq-stats
+
+
+1. Introduction
+
+cpufreq-stats is a driver that provides CPU frequency statistics for each CPU.
+These statistics are provided in /sysfs as a bunch of read_only interfaces. This
+interface (when configured) will appear in a separate directory under cpufreq
+in /sysfs (<sysfs root>/devices/system/cpu/cpuX/cpufreq/stats/) for each CPU.
+Various statistics will form read_only files under this directory.
+
+This driver is designed to be independent of any particular cpufreq_driver
+that may be running on your CPU. So, it will work with any cpufreq_driver.
+
+
+2. Statistics Provided (with example)
+
+cpufreq stats provides following statistics (explained in detail below).
+- time_in_state
+- total_trans
+- trans_table
+
+All the statistics will be from the time the stats driver has been inserted
+to the time when a read of a particular statistic is done. Obviously, stats
+driver will not have any information about the frequency transitions before
+the stats driver insertion.
+
+--------------------------------------------------------------------------------
+<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l
+total 0
+drwxr-xr-x 2 root root 0 May 14 16:06 .
+drwxr-xr-x 3 root root 0 May 14 15:58 ..
+-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
+-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
+-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
+--------------------------------------------------------------------------------
+
+- time_in_state
+This gives the amount of time spent in each of the frequencies supported by
+this CPU. The cat output will have "<frequency> <time>" pair in each line, which
+will mean this CPU spent <time> usertime units of time at <frequency>. Output
+will have one line for each of the supported frequencies. usertime units here
+is 10mS (similar to other time exported in /proc).
+
+--------------------------------------------------------------------------------
+<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat time_in_state
+3600000 2089
+3400000 136
+3200000 34
+3000000 67
+2800000 172488
+--------------------------------------------------------------------------------
+
+
+- total_trans
+This gives the total number of frequency transitions on this CPU. The cat
+output will have a single count which is the total number of frequency
+transitions.
+
+--------------------------------------------------------------------------------
+<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat total_trans
+20
+--------------------------------------------------------------------------------
+
+- trans_table
+This will give a fine grained information about all the CPU frequency
+transitions. The cat output here is a two dimensional matrix, where an entry
+<i,j> (row i, column j) represents the count of number of transitions from
+Freq_i to Freq_j. Freq_i is in descending order with increasing rows and
+Freq_j is in descending order with increasing columns. The output here also
+contains the actual freq values for each row and column for better readability.
+
+--------------------------------------------------------------------------------
+<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table
+ From : To
+ : 3600000 3400000 3200000 3000000 2800000
+ 3600000: 0 5 0 0 0
+ 3400000: 4 0 2 0 0
+ 3200000: 0 1 0 2 0
+ 3000000: 0 0 1 0 3
+ 2800000: 0 0 0 2 0
+--------------------------------------------------------------------------------
+
+
+3. Configuring cpufreq-stats
+
+To configure cpufreq-stats in your kernel
+Config Main Menu
+ Power management options (ACPI, APM) --->
+ CPU Frequency scaling --->
+ [*] CPU Frequency scaling
+ <*> CPU frequency translation statistics
+ [*] CPU frequency translation statistics details
+
+
+"CPU Frequency scaling" (CONFIG_CPU_FREQ) should be enabled to configure
+cpufreq-stats.
+
+"CPU frequency translation statistics" (CONFIG_CPU_FREQ_STAT) provides the
+basic statistics which includes time_in_state and total_trans.
+
+"CPU frequency translation statistics details" (CONFIG_CPU_FREQ_STAT_DETAILS)
+provides fine grained cpufreq stats by trans_table. The reason for having a
+separate config option for trans_table is:
+- trans_table goes against the traditional /sysfs rule of one value per
+ interface. It provides a whole bunch of value in a 2 dimensional matrix
+ form.
+
+Once these two options are enabled and your CPU supports cpufrequency, you
+will be able to see the CPU frequency statistics in /sysfs.
+
+
+
+
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
new file mode 100644
index 000000000..77ec21574
--- /dev/null
+++ b/Documentation/cpu-freq/governors.txt
@@ -0,0 +1,269 @@
+ CPU frequency and voltage scaling code in the Linux(TM) kernel
+
+
+ L i n u x C P U F r e q
+
+ C P U F r e q G o v e r n o r s
+
+ - information for users and developers -
+
+
+ Dominik Brodowski <linux@brodo.de>
+ some additions and corrections by Nico Golde <nico@ngolde.de>
+
+
+
+ Clock scaling allows you to change the clock speed of the CPUs on the
+ fly. This is a nice method to save battery power, because the lower
+ the clock speed, the less power the CPU consumes.
+
+
+Contents:
+---------
+1. What is a CPUFreq Governor?
+
+2. Governors In the Linux Kernel
+2.1 Performance
+2.2 Powersave
+2.3 Userspace
+2.4 Ondemand
+2.5 Conservative
+
+3. The Governor Interface in the CPUfreq Core
+
+
+
+1. What Is A CPUFreq Governor?
+==============================
+
+Most cpufreq drivers (in fact, all except one, longrun) or even most
+cpu frequency scaling algorithms only offer the CPU to be set to one
+frequency. In order to offer dynamic frequency scaling, the cpufreq
+core must be able to tell these drivers of a "target frequency". So
+these specific drivers will be transformed to offer a "->target/target_index"
+call instead of the existing "->setpolicy" call. For "longrun", all
+stays the same, though.
+
+How to decide what frequency within the CPUfreq policy should be used?
+That's done using "cpufreq governors". Two are already in this patch
+-- they're the already existing "powersave" and "performance" which
+set the frequency statically to the lowest or highest frequency,
+respectively. At least two more such governors will be ready for
+addition in the near future, but likely many more as there are various
+different theories and models about dynamic frequency scaling
+around. Using such a generic interface as cpufreq offers to scaling
+governors, these can be tested extensively, and the best one can be
+selected for each specific use.
+
+Basically, it's the following flow graph:
+
+CPU can be set to switch independently | CPU can only be set
+ within specific "limits" | to specific frequencies
+
+ "CPUfreq policy"
+ consists of frequency limits (policy->{min,max})
+ and CPUfreq governor to be used
+ / \
+ / \
+ / the cpufreq governor decides
+ / (dynamically or statically)
+ / what target_freq to set within
+ / the limits of policy->{min,max}
+ / \
+ / \
+ Using the ->setpolicy call, Using the ->target/target_index call,
+ the limits and the the frequency closest
+ "policy" is set. to target_freq is set.
+ It is assured that it
+ is within policy->{min,max}
+
+
+2. Governors In the Linux Kernel
+================================
+
+2.1 Performance
+---------------
+
+The CPUfreq governor "performance" sets the CPU statically to the
+highest frequency within the borders of scaling_min_freq and
+scaling_max_freq.
+
+
+2.2 Powersave
+-------------
+
+The CPUfreq governor "powersave" sets the CPU statically to the
+lowest frequency within the borders of scaling_min_freq and
+scaling_max_freq.
+
+
+2.3 Userspace
+-------------
+
+The CPUfreq governor "userspace" allows the user, or any userspace
+program running with UID "root", to set the CPU to a specific frequency
+by making a sysfs file "scaling_setspeed" available in the CPU-device
+directory.
+
+
+2.4 Ondemand
+------------
+
+The CPUfreq governor "ondemand" sets the CPU depending on the
+current usage. To do this the CPU must have the capability to
+switch the frequency very quickly. There are a number of sysfs file
+accessible parameters:
+
+sampling_rate: measured in uS (10^-6 seconds), this is how often you
+want the kernel to look at the CPU usage and to make decisions on
+what to do about the frequency. Typically this is set to values of
+around '10000' or more. It's default value is (cmp. with users-guide.txt):
+transition_latency * 1000
+Be aware that transition latency is in ns and sampling_rate is in us, so you
+get the same sysfs value by default.
+Sampling rate should always get adjusted considering the transition latency
+To set the sampling rate 750 times as high as the transition latency
+in the bash (as said, 1000 is default), do:
+echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
+ >ondemand/sampling_rate
+
+sampling_rate_min:
+The sampling rate is limited by the HW transition latency:
+transition_latency * 100
+Or by kernel restrictions:
+If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
+If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the
+limits depend on the CONFIG_HZ option:
+HZ=1000: min=20000us (20ms)
+HZ=250: min=80000us (80ms)
+HZ=100: min=200000us (200ms)
+The highest value of kernel and HW latency restrictions is shown and
+used as the minimum sampling rate.
+
+up_threshold: defines what the average CPU usage between the samplings
+of 'sampling_rate' needs to be for the kernel to make a decision on
+whether it should increase the frequency. For example when it is set
+to its default value of '95' it means that between the checking
+intervals the CPU needs to be on average more than 95% in use to then
+decide that the CPU frequency needs to be increased.
+
+ignore_nice_load: this parameter takes a value of '0' or '1'. When
+set to '0' (its default), all processes are counted towards the
+'cpu utilisation' value. When set to '1', the processes that are
+run with a 'nice' value will not count (and thus be ignored) in the
+overall usage calculation. This is useful if you are running a CPU
+intensive calculation on your laptop that you do not care how long it
+takes to complete as you can 'nice' it and prevent it from taking part
+in the deciding process of whether to increase your CPU frequency.
+
+sampling_down_factor: this parameter controls the rate at which the
+kernel makes a decision on when to decrease the frequency while running
+at top speed. When set to 1 (the default) decisions to reevaluate load
+are made at the same interval regardless of current clock speed. But
+when set to greater than 1 (e.g. 100) it acts as a multiplier for the
+scheduling interval for reevaluating load when the CPU is at its top
+speed due to high load. This improves performance by reducing the overhead
+of load evaluation and helping the CPU stay at its top speed when truly
+busy, rather than shifting back and forth in speed. This tunable has no
+effect on behavior at lower speeds/lower CPU loads.
+
+powersave_bias: this parameter takes a value between 0 to 1000. It
+defines the percentage (times 10) value of the target frequency that
+will be shaved off of the target. For example, when set to 100 -- 10%,
+when ondemand governor would have targeted 1000 MHz, it will target
+1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0
+(disabled) by default.
+When AMD frequency sensitivity powersave bias driver --
+drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter
+defines the workload frequency sensitivity threshold in which a lower
+frequency is chosen instead of ondemand governor's original target.
+The frequency sensitivity is a hardware reported (on AMD Family 16h
+Processors and above) value between 0 to 100% that tells software how
+the performance of the workload running on a CPU will change when
+frequency changes. A workload with sensitivity of 0% (memory/IO-bound)
+will not perform any better on higher core frequency, whereas a
+workload with sensitivity of 100% (CPU-bound) will perform better
+higher the frequency. When the driver is loaded, this is set to 400
+by default -- for CPUs running workloads with sensitivity value below
+40%, a lower frequency is chosen. Unloading the driver or writing 0
+will disable this feature.
+
+
+2.5 Conservative
+----------------
+
+The CPUfreq governor "conservative", much like the "ondemand"
+governor, sets the CPU depending on the current usage. It differs in
+behaviour in that it gracefully increases and decreases the CPU speed
+rather than jumping to max speed the moment there is any load on the
+CPU. This behaviour more suitable in a battery powered environment.
+The governor is tweaked in the same manner as the "ondemand" governor
+through sysfs with the addition of:
+
+freq_step: this describes what percentage steps the cpu freq should be
+increased and decreased smoothly by. By default the cpu frequency will
+increase in 5% chunks of your maximum cpu frequency. You can change this
+value to anywhere between 0 and 100 where '0' will effectively lock your
+CPU at a speed regardless of its load whilst '100' will, in theory, make
+it behave identically to the "ondemand" governor.
+
+down_threshold: same as the 'up_threshold' found for the "ondemand"
+governor but for the opposite direction. For example when set to its
+default value of '20' it means that if the CPU usage needs to be below
+20% between samples to have the frequency decreased.
+
+sampling_down_factor: similar functionality as in "ondemand" governor.
+But in "conservative", it controls the rate at which the kernel makes
+a decision on when to decrease the frequency while running in any
+speed. Load for frequency increase is still evaluated every
+sampling rate.
+
+3. The Governor Interface in the CPUfreq Core
+=============================================
+
+A new governor must register itself with the CPUfreq core using
+"cpufreq_register_governor". The struct cpufreq_governor, which has to
+be passed to that function, must contain the following values:
+
+governor->name - A unique name for this governor
+governor->governor - The governor callback function
+governor->owner - .THIS_MODULE for the governor module (if
+ appropriate)
+
+The governor->governor callback is called with the current (or to-be-set)
+cpufreq_policy struct for that CPU, and an unsigned int event. The
+following events are currently defined:
+
+CPUFREQ_GOV_START: This governor shall start its duty for the CPU
+ policy->cpu
+CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU
+ policy->cpu
+CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to
+ policy->min and policy->max.
+
+If you need other "events" externally of your driver, _only_ use the
+cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the
+CPUfreq core to ensure proper locking.
+
+
+The CPUfreq governor may call the CPU processor driver using one of
+these two functions:
+
+int cpufreq_driver_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation);
+
+int __cpufreq_driver_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation);
+
+target_freq must be within policy->min and policy->max, of course.
+What's the difference between these two functions? When your governor
+still is in a direct code path of a call to governor->governor, the
+per-CPU cpufreq lock is still held in the cpufreq core, and there's
+no need to lock it again (in fact, this would cause a deadlock). So
+use __cpufreq_driver_target only in these cases. In all other cases
+(for example, when there's a "daemonized" function that wakes up
+every second), use cpufreq_driver_target to lock the cpufreq per-CPU
+lock before the command is passed to the cpufreq processor driver.
+
diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt
new file mode 100644
index 000000000..dc024ab40
--- /dev/null
+++ b/Documentation/cpu-freq/index.txt
@@ -0,0 +1,54 @@
+ CPU frequency and voltage scaling code in the Linux(TM) kernel
+
+
+ L i n u x C P U F r e q
+
+
+
+
+ Dominik Brodowski <linux@brodo.de>
+
+
+
+ Clock scaling allows you to change the clock speed of the CPUs on the
+ fly. This is a nice method to save battery power, because the lower
+ the clock speed, the less power the CPU consumes.
+
+
+
+Documents in this directory:
+----------------------------
+core.txt - General description of the CPUFreq core and
+ of CPUFreq notifiers
+
+cpu-drivers.txt - How to implement a new cpufreq processor driver
+
+governors.txt - What are cpufreq governors and how to
+ implement them?
+
+index.txt - File index, Mailing list and Links (this document)
+
+user-guide.txt - User Guide to CPUFreq
+
+
+Mailing List
+------------
+There is a CPU frequency changing CVS commit and general list where
+you can report bugs, problems or submit patches. To post a message,
+send an email to linux-pm@vger.kernel.org, to subscribe go to
+http://vger.kernel.org/vger-lists.html#linux-pm and follow the
+instructions there.
+
+Links
+-----
+the FTP archives:
+* ftp://ftp.linux.org.uk/pub/linux/cpufreq/
+
+how to access the CVS repository:
+* http://cvs.arm.linux.org.uk/
+
+the CPUFreq Mailing list:
+* http://vger.kernel.org/vger-lists.html#cpufreq
+
+Clock and voltage scaling for the SA-1100:
+* http://www.lartmaker.nl/projects/scaling
diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
new file mode 100644
index 000000000..655750743
--- /dev/null
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -0,0 +1,64 @@
+Intel P-state driver
+--------------------
+
+This driver provides an interface to control the P state selection for
+SandyBridge+ Intel processors. The driver can operate two different
+modes based on the processor model legacy and Hardware P state (HWP)
+mode.
+
+In legacy mode the driver implements a scaling driver with an internal
+governor for Intel Core processors. The driver follows the same model
+as the Transmeta scaling driver (longrun.c) and implements the
+setpolicy() instead of target(). Scaling drivers that implement
+setpolicy() are assumed to implement internal governors by the cpufreq
+core. All the logic for selecting the current P state is contained
+within the driver; no external governor is used by the cpufreq core.
+
+In HWP mode P state selection is implemented in the processor
+itself. The driver provides the interfaces between the cpufreq core and
+the processor to control P state selection based on user preferences
+and reporting frequency to the cpufreq core. In this mode the
+internal governor code is disabled.
+
+In addtion to the interfaces provided by the cpufreq core for
+controlling frequency the driver provides sysfs files for
+controlling P state selection. These files have been added to
+/sys/devices/system/cpu/intel_pstate/
+
+ max_perf_pct: limits the maximum P state that will be requested by
+ the driver stated as a percentage of the available performance. The
+ available (P states) performance may be reduced by the no_turbo
+ setting described below.
+
+ min_perf_pct: limits the minimum P state that will be requested by
+ the driver stated as a percentage of the max (non-turbo)
+ performance level.
+
+ no_turbo: limits the driver to selecting P states below the turbo
+ frequency range.
+
+ turbo_pct: displays the percentage of the total performance that
+ is supported by hardware that is in the turbo range. This number
+ is independent of whether turbo has been disabled or not.
+
+ num_pstates: displays the number of pstates that are supported
+ by hardware. This number is independent of whether turbo has
+ been disabled or not.
+
+For contemporary Intel processors, the frequency is controlled by the
+processor itself and the P-states exposed to software are related to
+performance levels. The idea that frequency can be set to a single
+frequency is fiction for Intel Core processors. Even if the scaling
+driver selects a single P state the actual frequency the processor
+will run at is selected by the processor itself.
+
+For legacy mode debugfs files have also been added to allow tuning of
+the internal governor algorythm. These files are located at
+/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
+
+ deadband
+ d_gain_pct
+ i_gain_pct
+ p_gain_pct
+ sample_rate_ms
+ setpoint
diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt
new file mode 100644
index 000000000..9e3c3b335
--- /dev/null
+++ b/Documentation/cpu-freq/pcc-cpufreq.txt
@@ -0,0 +1,207 @@
+/*
+ * pcc-cpufreq.txt - PCC interface documentation
+ *
+ * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com>
+ * Copyright (C) 2009 Hewlett-Packard Development Company, L.P.
+ * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com>
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON
+ * INFRINGEMENT. See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+
+ Processor Clocking Control Driver
+ ---------------------------------
+
+Contents:
+---------
+1. Introduction
+1.1 PCC interface
+1.1.1 Get Average Frequency
+1.1.2 Set Desired Frequency
+1.2 Platforms affected
+2. Driver and /sys details
+2.1 scaling_available_frequencies
+2.2 cpuinfo_transition_latency
+2.3 cpuinfo_cur_freq
+2.4 related_cpus
+3. Caveats
+
+1. Introduction:
+----------------
+Processor Clocking Control (PCC) is an interface between the platform
+firmware and OSPM. It is a mechanism for coordinating processor
+performance (ie: frequency) between the platform firmware and the OS.
+
+The PCC driver (pcc-cpufreq) allows OSPM to take advantage of the PCC
+interface.
+
+OS utilizes the PCC interface to inform platform firmware what frequency the
+OS wants for a logical processor. The platform firmware attempts to achieve
+the requested frequency. If the request for the target frequency could not be
+satisfied by platform firmware, then it usually means that power budget
+conditions are in place, and "power capping" is taking place.
+
+1.1 PCC interface:
+------------------
+The complete PCC specification is available here:
+http://www.acpica.org/download/Processor-Clocking-Control-v1p0.pdf
+
+PCC relies on a shared memory region that provides a channel for communication
+between the OS and platform firmware. PCC also implements a "doorbell" that
+is used by the OS to inform the platform firmware that a command has been
+sent.
+
+The ACPI PCCH() method is used to discover the location of the PCC shared
+memory region. The shared memory region header contains the "command" and
+"status" interface. PCCH() also contains details on how to access the platform
+doorbell.
+
+The following commands are supported by the PCC interface:
+* Get Average Frequency
+* Set Desired Frequency
+
+The ACPI PCCP() method is implemented for each logical processor and is
+used to discover the offsets for the input and output buffers in the shared
+memory region.
+
+When PCC mode is enabled, the platform will not expose processor performance
+or throttle states (_PSS, _TSS and related ACPI objects) to OSPM. Therefore,
+the native P-state driver (such as acpi-cpufreq for Intel, powernow-k8 for
+AMD) will not load.
+
+However, OSPM remains in control of policy. The governor (eg: "ondemand")
+computes the required performance for each processor based on server workload.
+The PCC driver fills in the command interface, and the input buffer and
+communicates the request to the platform firmware. The platform firmware is
+responsible for delivering the requested performance.
+
+Each PCC command is "global" in scope and can affect all the logical CPUs in
+the system. Therefore, PCC is capable of performing "group" updates. With PCC
+the OS is capable of getting/setting the frequency of all the logical CPUs in
+the system with a single call to the BIOS.
+
+1.1.1 Get Average Frequency:
+----------------------------
+This command is used by the OSPM to query the running frequency of the
+processor since the last time this command was completed. The output buffer
+indicates the average unhalted frequency of the logical processor expressed as
+a percentage of the nominal (ie: maximum) CPU frequency. The output buffer
+also signifies if the CPU frequency is limited by a power budget condition.
+
+1.1.2 Set Desired Frequency:
+----------------------------
+This command is used by the OSPM to communicate to the platform firmware the
+desired frequency for a logical processor. The output buffer is currently
+ignored by OSPM. The next invocation of "Get Average Frequency" will inform
+OSPM if the desired frequency was achieved or not.
+
+1.2 Platforms affected:
+-----------------------
+The PCC driver will load on any system where the platform firmware:
+* supports the PCC interface, and the associated PCCH() and PCCP() methods
+* assumes responsibility for managing the hardware clocking controls in order
+to deliver the requested processor performance
+
+Currently, certain HP ProLiant platforms implement the PCC interface. On those
+platforms PCC is the "default" choice.
+
+However, it is possible to disable this interface via a BIOS setting. In
+such an instance, as is also the case on platforms where the PCC interface
+is not implemented, the PCC driver will fail to load silently.
+
+2. Driver and /sys details:
+---------------------------
+When the driver loads, it merely prints the lowest and the highest CPU
+frequencies supported by the platform firmware.
+
+The PCC driver loads with a message such as:
+pcc-cpufreq: (v1.00.00) driver loaded with frequency limits: 1600 MHz, 2933
+MHz
+
+This means that the OPSM can request the CPU to run at any frequency in
+between the limits (1600 MHz, and 2933 MHz) specified in the message.
+
+Internally, there is no need for the driver to convert the "target" frequency
+to a corresponding P-state.
+
+The VERSION number for the driver will be of the format v.xy.ab.
+eg: 1.00.02
+ ----- --
+ | |
+ | -- this will increase with bug fixes/enhancements to the driver
+ |-- this is the version of the PCC specification the driver adheres to
+
+
+The following is a brief discussion on some of the fields exported via the
+/sys filesystem and how their values are affected by the PCC driver:
+
+2.1 scaling_available_frequencies:
+----------------------------------
+scaling_available_frequencies is not created in /sys. No intermediate
+frequencies need to be listed because the BIOS will try to achieve any
+frequency, within limits, requested by the governor. A frequency does not have
+to be strictly associated with a P-state.
+
+2.2 cpuinfo_transition_latency:
+-------------------------------
+The cpuinfo_transition_latency field is 0. The PCC specification does
+not include a field to expose this value currently.
+
+2.3 cpuinfo_cur_freq:
+---------------------
+A) Often cpuinfo_cur_freq will show a value different than what is declared
+in the scaling_available_frequencies or scaling_cur_freq, or scaling_max_freq.
+This is due to "turbo boost" available on recent Intel processors. If certain
+conditions are met the BIOS can achieve a slightly higher speed than requested
+by OSPM. An example:
+
+scaling_cur_freq : 2933000
+cpuinfo_cur_freq : 3196000
+
+B) There is a round-off error associated with the cpuinfo_cur_freq value.
+Since the driver obtains the current frequency as a "percentage" (%) of the
+nominal frequency from the BIOS, sometimes, the values displayed by
+scaling_cur_freq and cpuinfo_cur_freq may not match. An example:
+
+scaling_cur_freq : 1600000
+cpuinfo_cur_freq : 1583000
+
+In this example, the nominal frequency is 2933 MHz. The driver obtains the
+current frequency, cpuinfo_cur_freq, as 54% of the nominal frequency:
+
+ 54% of 2933 MHz = 1583 MHz
+
+Nominal frequency is the maximum frequency of the processor, and it usually
+corresponds to the frequency of the P0 P-state.
+
+2.4 related_cpus:
+-----------------
+The related_cpus field is identical to affected_cpus.
+
+affected_cpus : 4
+related_cpus : 4
+
+Currently, the PCC driver does not evaluate _PSD. The platforms that support
+PCC do not implement SW_ALL. So OSPM doesn't need to perform any coordination
+to ensure that the same frequency is requested of all dependent CPUs.
+
+3. Caveats:
+-----------
+The "cpufreq_stats" module in its present form cannot be loaded and
+expected to work with the PCC driver. Since the "cpufreq_stats" module
+provides information wrt each P-state, it is not applicable to the PCC driver.
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
new file mode 100644
index 000000000..ff2f28332
--- /dev/null
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -0,0 +1,224 @@
+ CPU frequency and voltage scaling code in the Linux(TM) kernel
+
+
+ L i n u x C P U F r e q
+
+ U S E R G U I D E
+
+
+ Dominik Brodowski <linux@brodo.de>
+
+
+
+ Clock scaling allows you to change the clock speed of the CPUs on the
+ fly. This is a nice method to save battery power, because the lower
+ the clock speed, the less power the CPU consumes.
+
+
+Contents:
+---------
+1. Supported Architectures and Processors
+1.1 ARM
+1.2 x86
+1.3 sparc64
+1.4 ppc
+1.5 SuperH
+1.6 Blackfin
+
+2. "Policy" / "Governor"?
+2.1 Policy
+2.2 Governor
+
+3. How to change the CPU cpufreq policy and/or speed
+3.1 Preferred interface: sysfs
+
+
+
+1. Supported Architectures and Processors
+=========================================
+
+1.1 ARM
+-------
+
+The following ARM processors are supported by cpufreq:
+
+ARM Integrator
+ARM-SA1100
+ARM-SA1110
+Intel PXA
+
+
+1.2 x86
+-------
+
+The following processors for the x86 architecture are supported by cpufreq:
+
+AMD Elan - SC400, SC410
+AMD mobile K6-2+
+AMD mobile K6-3+
+AMD mobile Duron
+AMD mobile Athlon
+AMD Opteron
+AMD Athlon 64
+Cyrix Media GXm
+Intel mobile PIII and Intel mobile PIII-M on certain chipsets
+Intel Pentium 4, Intel Xeon
+Intel Pentium M (Centrino)
+National Semiconductors Geode GX
+Transmeta Crusoe
+Transmeta Efficeon
+VIA Cyrix 3 / C3
+various processors on some ACPI 2.0-compatible systems [*]
+
+[*] Only if "ACPI Processor Performance States" are available
+to the ACPI<->BIOS interface.
+
+
+1.3 sparc64
+-----------
+
+The following processors for the sparc64 architecture are supported by
+cpufreq:
+
+UltraSPARC-III
+
+
+1.4 ppc
+-------
+
+Several "PowerBook" and "iBook2" notebooks are supported.
+
+
+1.5 SuperH
+----------
+
+All SuperH processors supporting rate rounding through the clock
+framework are supported by cpufreq.
+
+1.6 Blackfin
+------------
+
+The following Blackfin processors are supported by cpufreq:
+
+BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher
+BF531, BF532, BF533, Rev 0.3 or higher
+BF534, BF536, BF537, Rev 0.2 or higher
+BF561, Rev 0.3 or higher
+BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher
+
+
+2. "Policy" / "Governor" ?
+==========================
+
+Some CPU frequency scaling-capable processor switch between various
+frequencies and operating voltages "on the fly" without any kernel or
+user involvement. This guarantees very fast switching to a frequency
+which is high enough to serve the user's needs, but low enough to save
+power.
+
+
+2.1 Policy
+----------
+
+On these systems, all you can do is select the lower and upper
+frequency limit as well as whether you want more aggressive
+power-saving or more instantly available processing power.
+
+
+2.2 Governor
+------------
+
+On all other cpufreq implementations, these boundaries still need to
+be set. Then, a "governor" must be selected. Such a "governor" decides
+what speed the processor shall run within the boundaries. One such
+"governor" is the "userspace" governor. This one allows the user - or
+a yet-to-implement userspace program - to decide what specific speed
+the processor shall run at.
+
+
+3. How to change the CPU cpufreq policy and/or speed
+====================================================
+
+3.1 Preferred Interface: sysfs
+------------------------------
+
+The preferred interface is located in the sysfs filesystem. If you
+mounted it at /sys, the cpufreq interface is located in a subdirectory
+"cpufreq" within the cpu-device directory
+(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU).
+
+cpuinfo_min_freq : this file shows the minimum operating
+ frequency the processor can run at(in kHz)
+cpuinfo_max_freq : this file shows the maximum operating
+ frequency the processor can run at(in kHz)
+cpuinfo_transition_latency The time it takes on this CPU to
+ switch between two frequencies in nano
+ seconds. If unknown or known to be
+ that high that the driver does not
+ work with the ondemand governor, -1
+ (CPUFREQ_ETERNAL) will be returned.
+ Using this information can be useful
+ to choose an appropriate polling
+ frequency for a kernel governor or
+ userspace daemon. Make sure to not
+ switch the frequency too often
+ resulting in performance loss.
+scaling_driver : this file shows what cpufreq driver is
+ used to set the frequency on this CPU
+
+scaling_available_governors : this file shows the CPUfreq governors
+ available in this kernel. You can see the
+ currently activated governor in
+
+scaling_governor, and by "echoing" the name of another
+ governor you can change it. Please note
+ that some governors won't load - they only
+ work on some specific architectures or
+ processors.
+
+cpuinfo_cur_freq : Current frequency of the CPU as obtained from
+ the hardware, in KHz. This is the frequency
+ the CPU actually runs at.
+
+scaling_available_frequencies : List of available frequencies, in KHz.
+
+scaling_min_freq and
+scaling_max_freq show the current "policy limits" (in
+ kHz). By echoing new values into these
+ files, you can change these limits.
+ NOTE: when setting a policy you need to
+ first set scaling_max_freq, then
+ scaling_min_freq.
+
+affected_cpus : List of Online CPUs that require software
+ coordination of frequency.
+
+related_cpus : List of Online + Offline CPUs that need software
+ coordination of frequency.
+
+scaling_driver : Hardware driver for cpufreq.
+
+scaling_cur_freq : Current frequency of the CPU as determined by
+ the governor and cpufreq core, in KHz. This is
+ the frequency the kernel thinks the CPU runs
+ at.
+
+bios_limit : If the BIOS tells the OS to limit a CPU to
+ lower frequencies, the user can read out the
+ maximum available frequency from this file.
+ This typically can happen through (often not
+ intended) BIOS settings, restrictions
+ triggered through a service processor or other
+ BIOS/HW based implementations.
+ This does not cover thermal ACPI limitations
+ which can be detected through the generic
+ thermal driver.
+
+If you have selected the "userspace" governor which allows you to
+set the CPU operating frequency to a specific value, you can read out
+the current frequency in
+
+scaling_setspeed. By "echoing" a new frequency into this
+ you can change the speed of the CPU,
+ but only within the limits of
+ scaling_min_freq and scaling_max_freq.