summaryrefslogtreecommitdiff
path: root/Documentation/padata.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/padata.txt')
-rw-r--r--Documentation/padata.txt160
1 files changed, 160 insertions, 0 deletions
diff --git a/Documentation/padata.txt b/Documentation/padata.txt
new file mode 100644
index 000000000..7ddfe216a
--- /dev/null
+++ b/Documentation/padata.txt
@@ -0,0 +1,160 @@
+The padata parallel execution mechanism
+Last updated for 2.6.36
+
+Padata is a mechanism by which the kernel can farm work out to be done in
+parallel on multiple CPUs while retaining the ordering of tasks. It was
+developed for use with the IPsec code, which needs to be able to perform
+encryption and decryption on large numbers of packets without reordering
+those packets. The crypto developers made a point of writing padata in a
+sufficiently general fashion that it could be put to other uses as well.
+
+The first step in using padata is to set up a padata_instance structure for
+overall control of how tasks are to be run:
+
+ #include <linux/padata.h>
+
+ struct padata_instance *padata_alloc(struct workqueue_struct *wq,
+ const struct cpumask *pcpumask,
+ const struct cpumask *cbcpumask);
+
+The pcpumask describes which processors will be used to execute work
+submitted to this instance in parallel. The cbcpumask defines which
+processors are allowed to be used as the serialization callback processor.
+The workqueue wq is where the work will actually be done; it should be
+a multithreaded queue, naturally.
+
+To allocate a padata instance with the cpu_possible_mask for both
+cpumasks this helper function can be used:
+
+ struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
+
+Note: Padata maintains two kinds of cpumasks internally. The user supplied
+cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable'
+cpumasks. The usable cpumasks are always a subset of active CPUs in the
+user supplied cpumasks; these are the cpumasks padata actually uses. So
+it is legal to supply a cpumask to padata that contains offline CPUs.
+Once an offline CPU in the user supplied cpumask comes online, padata
+is going to use it.
+
+There are functions for enabling and disabling the instance:
+
+ int padata_start(struct padata_instance *pinst);
+ void padata_stop(struct padata_instance *pinst);
+
+These functions are setting or clearing the "PADATA_INIT" flag;
+if that flag is not set, other functions will refuse to work.
+padata_start returns zero on success (flag set) or -EINVAL if the
+padata cpumask contains no active CPU (flag not set).
+padata_stop clears the flag and blocks until the padata instance
+is unused.
+
+The list of CPUs to be used can be adjusted with these functions:
+
+ int padata_set_cpumasks(struct padata_instance *pinst,
+ cpumask_var_t pcpumask,
+ cpumask_var_t cbcpumask);
+ int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
+ cpumask_var_t cpumask);
+ int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask);
+ int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask);
+
+Changing the CPU masks are expensive operations, though, so it should not be
+done with great frequency.
+
+It's possible to change both cpumasks of a padata instance with
+padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask)
+and for the serial callback function (cbcpumask). padata_set_cpumask is used to
+change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL,
+PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use.
+To simply add or remove one CPU from a certain cpumask the functions
+padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or
+remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
+
+If a user is interested in padata cpumask changes, he can register to
+the padata cpumask change notifier:
+
+ int padata_register_cpumask_notifier(struct padata_instance *pinst,
+ struct notifier_block *nblock);
+
+To unregister from that notifier:
+
+ int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
+ struct notifier_block *nblock);
+
+The padata cpumask change notifier notifies about changes of the usable
+cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
+
+Padata calls the notifier chain with:
+
+ blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
+ notification_mask,
+ &pd_new->cpumask);
+
+Here cpumask_change_notifier is registered notifier, notification_mask
+is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer
+to a struct padata_cpumask that contains the new cpumask information.
+
+Actually submitting work to the padata instance requires the creation of a
+padata_priv structure:
+
+ struct padata_priv {
+ /* Other stuff here... */
+ void (*parallel)(struct padata_priv *padata);
+ void (*serial)(struct padata_priv *padata);
+ };
+
+This structure will almost certainly be embedded within some larger
+structure specific to the work to be done. Most of its fields are private to
+padata, but the structure should be zeroed at initialisation time, and the
+parallel() and serial() functions should be provided. Those functions will
+be called in the process of getting the work done as we will see
+momentarily.
+
+The submission of work is done with:
+
+ int padata_do_parallel(struct padata_instance *pinst,
+ struct padata_priv *padata, int cb_cpu);
+
+The pinst and padata structures must be set up as described above; cb_cpu
+specifies which CPU will be used for the final callback when the work is
+done; it must be in the current instance's CPU mask. The return value from
+padata_do_parallel() is zero on success, indicating that the work is in
+progress. -EBUSY means that somebody, somewhere else is messing with the
+instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being
+in that CPU mask or about a not running instance.
+
+Each task submitted to padata_do_parallel() will, in turn, be passed to
+exactly one call to the above-mentioned parallel() function, on one CPU, so
+true parallelism is achieved by submitting multiple tasks. Despite the
+fact that the workqueue is used to make these calls, parallel() is run with
+software interrupts disabled and thus cannot sleep. The parallel()
+function gets the padata_priv structure pointer as its lone parameter;
+information about the actual work to be done is probably obtained by using
+container_of() to find the enclosing structure.
+
+Note that parallel() has no return value; the padata subsystem assumes that
+parallel() will take responsibility for the task from this point. The work
+need not be completed during this call, but, if parallel() leaves work
+outstanding, it should be prepared to be called again with a new job before
+the previous one completes. When a task does complete, parallel() (or
+whatever function actually finishes the job) should inform padata of the
+fact with a call to:
+
+ void padata_do_serial(struct padata_priv *padata);
+
+At some point in the future, padata_do_serial() will trigger a call to the
+serial() function in the padata_priv structure. That call will happen on
+the CPU requested in the initial call to padata_do_parallel(); it, too, is
+done through the workqueue, but with local software interrupts disabled.
+Note that this call may be deferred for a while since the padata code takes
+pains to ensure that tasks are completed in the order in which they were
+submitted.
+
+The one remaining function in the padata API should be called to clean up
+when a padata instance is no longer needed:
+
+ void padata_free(struct padata_instance *pinst);
+
+This function will busy-wait while any remaining tasks are completed, so it
+might be best not to call it while there is work outstanding. Shutting
+down the workqueue, if necessary, should be done separately.