diff options
Diffstat (limited to 'Documentation/padata.txt')
-rw-r--r-- | Documentation/padata.txt | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/Documentation/padata.txt b/Documentation/padata.txt new file mode 100644 index 000000000..7ddfe216a --- /dev/null +++ b/Documentation/padata.txt @@ -0,0 +1,160 @@ +The padata parallel execution mechanism +Last updated for 2.6.36 + +Padata is a mechanism by which the kernel can farm work out to be done in +parallel on multiple CPUs while retaining the ordering of tasks. It was +developed for use with the IPsec code, which needs to be able to perform +encryption and decryption on large numbers of packets without reordering +those packets. The crypto developers made a point of writing padata in a +sufficiently general fashion that it could be put to other uses as well. + +The first step in using padata is to set up a padata_instance structure for +overall control of how tasks are to be run: + + #include <linux/padata.h> + + struct padata_instance *padata_alloc(struct workqueue_struct *wq, + const struct cpumask *pcpumask, + const struct cpumask *cbcpumask); + +The pcpumask describes which processors will be used to execute work +submitted to this instance in parallel. The cbcpumask defines which +processors are allowed to be used as the serialization callback processor. +The workqueue wq is where the work will actually be done; it should be +a multithreaded queue, naturally. + +To allocate a padata instance with the cpu_possible_mask for both +cpumasks this helper function can be used: + + struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); + +Note: Padata maintains two kinds of cpumasks internally. The user supplied +cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable' +cpumasks. The usable cpumasks are always a subset of active CPUs in the +user supplied cpumasks; these are the cpumasks padata actually uses. So +it is legal to supply a cpumask to padata that contains offline CPUs. +Once an offline CPU in the user supplied cpumask comes online, padata +is going to use it. + +There are functions for enabling and disabling the instance: + + int padata_start(struct padata_instance *pinst); + void padata_stop(struct padata_instance *pinst); + +These functions are setting or clearing the "PADATA_INIT" flag; +if that flag is not set, other functions will refuse to work. +padata_start returns zero on success (flag set) or -EINVAL if the +padata cpumask contains no active CPU (flag not set). +padata_stop clears the flag and blocks until the padata instance +is unused. + +The list of CPUs to be used can be adjusted with these functions: + + int padata_set_cpumasks(struct padata_instance *pinst, + cpumask_var_t pcpumask, + cpumask_var_t cbcpumask); + int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, + cpumask_var_t cpumask); + int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask); + int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask); + +Changing the CPU masks are expensive operations, though, so it should not be +done with great frequency. + +It's possible to change both cpumasks of a padata instance with +padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask) +and for the serial callback function (cbcpumask). padata_set_cpumask is used to +change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL, +PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use. +To simply add or remove one CPU from a certain cpumask the functions +padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or +remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. + +If a user is interested in padata cpumask changes, he can register to +the padata cpumask change notifier: + + int padata_register_cpumask_notifier(struct padata_instance *pinst, + struct notifier_block *nblock); + +To unregister from that notifier: + + int padata_unregister_cpumask_notifier(struct padata_instance *pinst, + struct notifier_block *nblock); + +The padata cpumask change notifier notifies about changes of the usable +cpumasks, i.e. the subset of active CPUs in the user supplied cpumask. + +Padata calls the notifier chain with: + + blocking_notifier_call_chain(&pinst->cpumask_change_notifier, + notification_mask, + &pd_new->cpumask); + +Here cpumask_change_notifier is registered notifier, notification_mask +is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer +to a struct padata_cpumask that contains the new cpumask information. + +Actually submitting work to the padata instance requires the creation of a +padata_priv structure: + + struct padata_priv { + /* Other stuff here... */ + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); + }; + +This structure will almost certainly be embedded within some larger +structure specific to the work to be done. Most of its fields are private to +padata, but the structure should be zeroed at initialisation time, and the +parallel() and serial() functions should be provided. Those functions will +be called in the process of getting the work done as we will see +momentarily. + +The submission of work is done with: + + int padata_do_parallel(struct padata_instance *pinst, + struct padata_priv *padata, int cb_cpu); + +The pinst and padata structures must be set up as described above; cb_cpu +specifies which CPU will be used for the final callback when the work is +done; it must be in the current instance's CPU mask. The return value from +padata_do_parallel() is zero on success, indicating that the work is in +progress. -EBUSY means that somebody, somewhere else is messing with the +instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being +in that CPU mask or about a not running instance. + +Each task submitted to padata_do_parallel() will, in turn, be passed to +exactly one call to the above-mentioned parallel() function, on one CPU, so +true parallelism is achieved by submitting multiple tasks. Despite the +fact that the workqueue is used to make these calls, parallel() is run with +software interrupts disabled and thus cannot sleep. The parallel() +function gets the padata_priv structure pointer as its lone parameter; +information about the actual work to be done is probably obtained by using +container_of() to find the enclosing structure. + +Note that parallel() has no return value; the padata subsystem assumes that +parallel() will take responsibility for the task from this point. The work +need not be completed during this call, but, if parallel() leaves work +outstanding, it should be prepared to be called again with a new job before +the previous one completes. When a task does complete, parallel() (or +whatever function actually finishes the job) should inform padata of the +fact with a call to: + + void padata_do_serial(struct padata_priv *padata); + +At some point in the future, padata_do_serial() will trigger a call to the +serial() function in the padata_priv structure. That call will happen on +the CPU requested in the initial call to padata_do_parallel(); it, too, is +done through the workqueue, but with local software interrupts disabled. +Note that this call may be deferred for a while since the padata code takes +pains to ensure that tasks are completed in the order in which they were +submitted. + +The one remaining function in the padata API should be called to clean up +when a padata instance is no longer needed: + + void padata_free(struct padata_instance *pinst); + +This function will busy-wait while any remaining tasks are completed, so it +might be best not to call it while there is work outstanding. Shutting +down the workqueue, if necessary, should be done separately. |