diff -urN ./Documentation/ABI/testing/debugfs-aufs ./Documentation/ABI/testing/debugfs-aufs --- ./Documentation/ABI/testing/debugfs-aufs 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/ABI/testing/debugfs-aufs 2009-04-06 10:26:45.000000000 +0200 @@ -0,0 +1,40 @@ +What: /debug/aufs/si_/ +Date: March 2009 +Contact: J. R. Okajima +Description: + Under /debug/aufs, a directory named si_ is created + per aufs mount, where is a unique id generated + internally. + +What: /debug/aufs/si_/xib +Date: March 2009 +Contact: J. R. Okajima +Description: + It shows the consumed blocks by xib (External Inode Number + Bitmap), its block size and file size. + When the aufs mount option 'noxino' is specified, it + will be empty. About XINO files, see + Documentation/filesystems/aufs/aufs.5 in detail. + +What: /debug/aufs/si_/xino0, xino1 ... xinoN +Date: March 2009 +Contact: J. R. Okajima +Description: + It shows the consumed blocks by xino (External Inode Number + Translation Table), its link count, block size and file + size. + When the aufs mount option 'noxino' is specified, it + will be empty. About XINO files, see + Documentation/filesystems/aufs/aufs.5 in detail. + +What: /debug/aufs/si_/xigen +Date: March 2009 +Contact: J. R. Okajima +Description: + It shows the consumed blocks by xigen (External Inode + Generation Table), its block size and file size. + If CONFIG_AUFS_EXPORT is disabled, this entry will not + be created. + When the aufs mount option 'noxino' is specified, it + will be empty. About XINO files, see + Documentation/filesystems/aufs/aufs.5 in detail. diff -urN ./Documentation/ABI/testing/sysfs-aufs ./Documentation/ABI/testing/sysfs-aufs --- ./Documentation/ABI/testing/sysfs-aufs 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/ABI/testing/sysfs-aufs 2009-04-06 10:26:45.000000000 +0200 @@ -0,0 +1,25 @@ +What: /sys/fs/aufs/si_/ +Date: March 2009 +Contact: J. R. Okajima +Description: + Under /sys/fs/aufs, a directory named si_ is created + per aufs mount, where is a unique id generated + internally. + +What: /sys/fs/aufs/si_/br0, br1 ... brN +Date: March 2009 +Contact: J. R. Okajima +Description: + It shows the abolute path of a member directory (which + is called branch) in aufs, and its permission. + +What: /sys/fs/aufs/si_/xi_path +Date: March 2009 +Contact: J. R. Okajima +Description: + It shows the abolute path of XINO (External Inode Number + Bitmap, Translation Table and Generation Table) file + even if it is the default path. + When the aufs mount option 'noxino' is specified, it + will be empty. About XINO files, see + Documentation/filesystems/aufs/aufs.5 in detail. diff -urN ./Documentation/filesystems/aufs/README ./Documentation/filesystems/aufs/README --- ./Documentation/filesystems/aufs/README 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/README 2009-04-06 10:26:45.000000000 +0200 @@ -0,0 +1,253 @@ + +Aufs2 -- advanced multi layered unification filesystem version 2 +http://aufs.sf.net +Junjiro R. Okajima + + +0. Introduction +---------------------------------------- +In the early days, aufs was entirely re-designed and re-implemented +Unionfs Version 1.x series. After many original ideas, approaches, +improvements and implementations, it becomes totally different from +Unionfs while keeping the basic features. +Recently, Unionfs Version 2.x series begin taking some of the same +approaches to aufs1's. +Unionfs is being developed by Professor Erez Zadok at Stony Brook +University and his team. + +This version of AUFS, aufs2 has several purposes. +- to be reviewed easily and widely. +- to make the source files simpler and smaller by dropping several + original features. + +Through this work, I found some bad things in aufs1 source code and +fixed them. Some of the dropped features will be reverted in the future, +but not all I'm afraid. +Aufs2 supports linux-2.6.27 and later. If you want older kernel version +support, try aufs1 from CVS on SourceForge. + + +1. Features +---------------------------------------- +- unite several directories into a single virtual filesystem. The member + directory is called as a branch. +- you can specify the permission flags to the branch, which are 'readonly', + 'readwrite' and 'whiteout-able.' +- by upper writable branch, internal copyup and whiteout, files/dirs on + readonly branch are modifiable logically. +- dynamic branch manipulation, add, del. +- etc... + +Also there are many enhancements in aufs1, such as: +- keep inode number by external inode number table +- keep the timestamps of file/dir in internal copyup operation +- seekable directory, supporting NFS readdir. +- support mmap(2) including /proc/PID/exe symlink, without page-copy +- whiteout is hardlinked in order to reduce the consumption of inodes + on branch +- do not copyup, nor create a whiteout when it is unnecessary +- revert a single systemcall when an error occurs in aufs +- remount interface instead of ioctl +- maintain /etc/mtab by an external shell script, /sbin/mount.aufs. +- loopback mounted filesystem as a branch +- kernel thread for removing the dir who has a plenty of whiteouts +- support copyup sparse file (a file which has a 'hole' in it) +- default permission flags for branches +- selectable permission flags for ro branch, whether whiteout can + exist or not +- export via NFS. +- support /fs/aufs. +- support multiple writable branches, some policies to select one + among multiple writable branches. +- a new semantics for link(2) and rename(2) to support multiple + writable branches. +- a delegation of the internal branch access to support task I/O + accounting, which also supports Linux Security Modules (LSM) mainly + for Suse AppArmor. +- nested mount, i.e. aufs as readonly no-whiteout branch of another aufs. +- copyup-on-open or copyup-on-write +- show-whiteout mode +- show configuration even out of kernel tree +- no glibc changes are required. +- pseudo hardlink (hardlink over branches) +- allow a direct access manually to a file on branch, e.g. bypassing aufs. + including NFS or remote filesystem branch. +- and more... + +Currently these features are dropped temporary from this version, aufs2. +See design/08plan.txt in detail. +- test only the highest one for the directory permission (dirperm1) +- show whiteout mode (shwh) +- copyup on open (coo=) +- being another aufs's readonly branch (robr) +- statistics of aufs thread (/sys/fs/aufs/stat) +- delegation mode (dlgt) +- intent.open/create (file open in a single lookup) + +Features or just an idea in the future (see also design/*.txt), +- reorder the branch index without del/re-add. +- permanent xino files +- an option for refreshing the opened files after add/del branches +- 'move' policy for copy-up between two writable branches, after + checking free space. +- remount option copy/move between two branches. (almost done) +- O_DIRECT +- light version, without branch manipulation. (unnecessary?) +- copyup in userspace +- inotify in userspace +- xattr, acl + + +2. Download +---------------------------------------- +Kindly one of aufs user, the Center for Scientific Computing and Free +Software (C3SL), Federal University of Parana offered me a public GIT +tree space. + +There are three GIT trees, aufs2-2.6, aufs2-standalone and aufs2-util. +While the aufs2-util is always necessary, you need either of aufs2-2.6 +or aufs2-standalone. + +The aufs2-2.6 tree includes the whole linux-2.6 GIT tree, +git://git.kernel.org/.../torvalds/linux-2.6.git. +And you cannot select CONFIG_AUFS_FS=m for this version, eg. you cannot +build aufs2 as an externel kernel module. +If you already have linux-2.6 GIT tree, you may want to pull and merge +the "aufs2" branch from this tree. + +On the other hand, the aufs2-standalone tree has only aufs2 source files +and a necessary patch, and you can select CONFIG_AUFS_FS=m. In other +words, the aufs2-standalone tree is generated from aufs2-2.6 tree by, +- extract new files and modifications. +- generate a single patch file from modifications. +- generate a ChangeLog file from git-log. +- commit the files newly and no log messages. this is not git-pull. + +Both of aufs2-2.6 and aufs2-standalone trees have a branch whose name is +in form of "aufs2-xx" where "xx" represents the linux kernel version, +"linux-2.6.xx". + +o aufs2-2.6 tree +$ git clone --reference /your/linux-2.6/git/tree \ + http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-2.6.git \ + aufs2-2.6.git +- if you don't have linux-2.6 GIT tree, then remove "--reference ..." +$ cd aufs2-2.6.git +$ git checkout origin/aufs2-xx # for instance, aufs2-27 for linux-2.6.27 + # aufs2 (no -xx) for the latest -rc version. + +o aufs2-standalone tree +$ git clone http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-standalone.git \ + aufs2-standalone.git +$ cd aufs2-standalone.git +$ git checkout origin/aufs2-xx # for instance, aufs2-27 for linux-2.6.27 +- apply "aufs2-standalone.patch" to your kernel source files. + +o aufs2-util tree +$ git clone http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-util.git \ + aufs2-util.git +$ cd aufs2-util.git +- no particular tag/branch currently. + + +3. Configuration and Compilation +---------------------------------------- +For aufs2-2.6 tree, +- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS. +- set other aufs configurations if necessary. + +For aufs2-standalone tree, +- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS, you can select =m. +- edit fs/aufs/config.mk and set other aufs configurations if necessary. + +And then, +- build your kernel (or a module) by "make" +- install it and reboot your system +- read README in aufs2-util, build and install it + + +4. Usage +---------------------------------------- +At first, make sure aufs2-util are installed, and please read the aufs +manual, ./Documentation/filesystems/aufs/aufs.5. +$ man -l aufs.5 + +And then, +$ mkdir /tmp/rw /tmp/aufs +# mount -t aufs -o br=/tmp/rw:${HOME}=ro none /tmp/aufs + +Here is another example. The result is equivalent. +# mount -t aufs -o br=/tmp/rw:${HOME} none /tmp/aufs + Or +# mount -t aufs -o br:/tmp/rw none /tmp/aufs +# mount -o remount,append:${HOME} /tmp/aufs + +Then, you can see whole tree of your home dir through /tmp/aufs. If +you modify a file under /tmp/aufs, the one on your home directory is +not affected, instead the same named file will be newly created under +/tmp/rw. And all of your modification to a file will be applied to +the one under /tmp/rw. This is called the file based Copy on Write +(COW) method. +Aufs mount options are described in aufs.5. + +Additionally, there are some sample usages of aufs which are a +diskless system with network booting, and LiveCD over NFS. +See sample dir in CVS tree on SourceForge. + + +5. Contact +---------------------------------------- +When you have any problems or strange behaviour in aufs, please let me +know with: +- /proc/mounts (instead of the output of mount(8)) +- /sys/module/aufs/* +- /sys/fs/aufs/* (if you have them) +- /debug/aufs/* (if you have them) +- linux kernel version + if your kernel is not plain, for example modified by distributor, + the url where i can download its source is necessary too. +- aufs version which was printed at loading the module or booting the + system, instead of the date you downloaded. +- configuration (define/undefine CONFIG_AUFS_xxx) +- kernel configuration or /proc/config.gz (if you have it) +- behaviour which you think to be incorrect +- actual operation, reproducible one is better +- mailto: aufs-users at lists.sourceforge.net + +Usually, I don't watch the Public Areas(Bugs, Support Requests, Patches, +and Feature Requests) on SourceForge. Please join and write to +aufs-users ML. + + +6. Acknowledgements +---------------------------------------- +Thanks to everyone who have tried and are using aufs, whoever +have reported a bug or any feedback. + +Especially donors: +Tomas Matejicek(slax.org) made a donation (much more than once). +Dai Itasaka made a donation (2007/8). +Chuck Smith made a donation (2008/4, 10 and 12). +Henk Schoneveld made a donation (2008/9). +Chih-Wei Huang, ASUS, CTC donated Eee PC 4G (2008/10). +Francois Dupoux made a donation (2008/11). +Bruno Cesar Ribas and Luis Carlos Erpen de Bona, C3SL serves public GIT +tree (2009/2). +William Grant made a donation (2009/3). + +Thank you very much. +Donations are always, including future donations, very important and +helpful for me to keep on developing aufs. + + +7. +---------------------------------------- +If you are an experienced user, no explanation is needed. Aufs is +just a linux filesystem. + + +Enjoy! + +# Local variables: ; +# mode: text; +# End: ; diff -urN ./Documentation/filesystems/aufs/aufs.5 ./Documentation/filesystems/aufs/aufs.5 --- ./Documentation/filesystems/aufs/aufs.5 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/aufs.5 2009-04-06 10:26:45.000000000 +0200 @@ -0,0 +1,1581 @@ +.ds AUFS_VERSION aufs2 +.ds AUFS_XINO_FNAME .aufs.xino +.ds AUFS_XINO_DEFPATH /tmp/.aufs.xino +.ds AUFS_DIRWH_DEF 3 +.ds AUFS_WH_PFX .wh. +.ds AUFS_WH_PFX_LEN 4 +.ds AUFS_WKQ_NAME aufsd +.ds AUFS_NWKQ_DEF 4 +.ds AUFS_WH_DIROPQ .wh..wh..opq +.ds AUFS_WH_BASE .wh..wh.aufs +.ds AUFS_WH_PLINKDIR .wh..wh.plnk +.ds AUFS_BRANCH_MAX 127 +.ds AUFS_MFS_SECOND_DEF 30 +.\".so aufs.tmac +. +.eo +.de TQ +.br +.ns +.TP \$1 +.. +.de Bu +.IP \(bu 4 +.. +.ec +.\" end of macro definitions +. +.\" ---------------------------------------------------------------------- +.TH aufs 5 \*[AUFS_VERSION] Linux "Linux Aufs User\[aq]s Manual" +.SH NAME +aufs \- another unionfs. version \*[AUFS_VERSION] + +.\" ---------------------------------------------------------------------- +.SH DESCRIPTION +Aufs is a stackable unification filesystem such as Unionfs, which unifies +several directories and provides a merged single directory. +In the early days, aufs was entirely re-designed and re-implemented +Unionfs Version 1.x series. After +many original ideas, approaches and improvements, it +becomes totally different from Unionfs while keeping the basic features. +See Unionfs Version 1.x series for the basic features. +Recently, Unionfs Version 2.x series begin taking some of same +approaches to aufs\[aq]s. + +.\" ---------------------------------------------------------------------- +.SH MOUNT OPTIONS +At mount-time, the order of interpreting options is, +.RS +.Bu +simple flags, except xino/noxino and udba=inotify +.Bu +branches +.Bu +xino/noxino +.Bu +udba=inotify +.RE + +At remount-time, +the options are interpreted in the given order, +e.g. left to right. +.RS +.Bu +create or remove +whiteout-base(\*[AUFS_WH_BASE]) and +whplink-dir(\*[AUFS_WH_PLINKDIR]) if necessary +.RE +. +.TP +.B br:BRANCH[:BRANCH ...] (dirs=BRANCH[:BRANCH ...]) +Adds new branches. +(cf. Branch Syntax). + +Aufs rejects the branch which is an ancestor or a descendant of another +branch. It is called overlapped. When the branch is loopback-mounted +directory, aufs also checks the source fs-image file of loopback +device. If the source file is a descendant of another branch, it will +be rejected too. + +After mounting aufs or adding a branch, if you move a branch under +another branch and make it descendant of another branch, aufs will not +work correctly. +. +.TP +.B [ add | ins ]:index:BRANCH +Adds a new branch. +The index begins with 0. +Aufs creates +whiteout-base(\*[AUFS_WH_BASE]) and +whplink-dir(\*[AUFS_WH_PLINKDIR]) if necessary. + +If there is the same named file on the lower branch (larger index), +aufs will hide the lower file. +You can only see the highest file. +You will be confused if the added branch has whiteouts (including +diropq), they may or may not hide the lower entries. +.\" It is recommended to make sure that the added branch has no whiteout. + +If a process have once mapped a file by mmap(2) with MAP_SHARED +and the same named file exists on the lower branch, +the process still refers the file on the lower(hidden) +branch after adding the branch. +If you want to update the contents of a process address space after +adding, you need to restart your process or open/mmap the file again. +.\" Usually, such files are executables or shared libraries. +(cf. Branch Syntax). +. +.TP +.B del:dir +Removes a branch. +Aufs does not remove +whiteout-base(\*[AUFS_WH_BASE]) and +whplink-dir(\*[AUFS_WH_PLINKDIR]) automatically. +For example, when you add a RO branch which was unified as RW, you +will see whiteout-base or whplink-dir on the added RO branch. + +If a process is referencing the file/directory on the deleting branch +(by open, mmap, current working directory, etc.), aufs will return an +error EBUSY. +. +.TP +.B mod:BRANCH +Modifies the permission flags of the branch. +Aufs creates or removes +whiteout-base(\*[AUFS_WH_BASE]) and/or +whplink-dir(\*[AUFS_WH_PLINKDIR]) if necessary. + +If the branch permission is been changing \[oq]rw\[cq] to \[oq]ro\[cq], and a process +is mapping a file by mmap(2) +.\" with MAP_SHARED +on the branch, the process may or may not +be able to modify its mapped memory region after modifying branch +permission flags. +(cf. Branch Syntax). +. +.TP +.B append:BRANCH +equivalent to \[oq]add:(last index + 1):BRANCH\[cq]. +(cf. Branch Syntax). +. +.TP +.B prepend:BRANCH +equivalent to \[oq]add:0:BRANCH.\[cq] +(cf. Branch Syntax). +. +.TP +.B xino=filename +Use external inode number bitmap and translation table. +When CONFIG_AUFS_EXPORT is enabled, external inode generation table too. +It is set to +/\*[AUFS_XINO_FNAME] by default, or +\*[AUFS_XINO_DEFPATH]. +Comma character in filename is not allowed. + +The files are created per an aufs and per a branch filesystem, and +unlinked. So you +cannot find this file, but it exists and is read/written frequently by +aufs. +(cf. External Inode Number Bitmap, Translation Table and Generation Table). + +If you enable CONFIG_SYSFS, the path of xino files are not shown in +/proc/mounts (and /etc/mtab), instead it is shown in +/fs/aufs/si_/xi_path. +Otherwise, it is shown in /proc/mounts unless it is not the default +path. +. +.TP +.B noxino +Stop using external inode number bitmap and translation table. + +If you use this option, +Some applications will not work correctly. +.\" And pseudo link feature will not work after the inode cache is +.\" shrunk. +(cf. External Inode Number Bitmap, Translation Table and Generation Table). +. +.TP +.B trunc_xib +Truncate the external inode number bitmap file. The truncation is done +automatically when you delete a branch unless you do not specify +\[oq]notrunc_xib\[cq] option. +(cf. External Inode Number Bitmap, Translation Table and Generation Table). +. +.TP +.B notrunc_xib +Stop truncating the external inode number bitmap file when you delete +a branch. +(cf. External Inode Number Bitmap, Translation Table and Generation Table). +. +.TP +.B create_policy | create=CREATE_POLICY +.TQ +.B copyup_policy | copyup | cpup=COPYUP_POLICY +Policies to select one among multiple writable branches. The default +values are \[oq]create=tdp\[cq] and \[oq]cpup=tdp\[cq]. +link(2) and rename(2) systemcalls have an exception. In aufs, they +try keeping their operations in the branch where the source exists. +(cf. Policies to Select One among Multiple Writable Branches). +. +.TP +.B verbose | v +Print some information. +Currently, it is only busy file (or inode) at deleting a branch. +. +.TP +.B noverbose | quiet | q | silent +Disable \[oq]verbose\[cq] option. +This is default value. +. +.TP +.B sum +df(1)/statfs(2) returns the total number of blocks and inodes of +all branches. +Note that there are cases that systemcalls may return ENOSPC, even if +df(1)/statfs(2) shows that aufs has some free space/inode. +. +.TP +.B nosum +Disable \[oq]sum\[cq] option. +This is default value. +. +.TP +.B dirwh=N +Watermark to remove a dir actually at rmdir(2) and rename(2). + +If the target dir which is being removed or renamed (destination dir) +has a huge number of whiteouts, i.e. the dir is empty logically but +physically, the cost to remove/rename the single +dir may be very high. +It is +required to unlink all of whiteouts internally before issuing +rmdir/rename to the branch. +To reduce the cost of single systemcall, +aufs renames the target dir to a whiteout-ed temporary name and +invokes a pre-created +kernel thread to remove whiteout-ed children and the target dir. +The rmdir/rename systemcall returns just after kicking the thread. + +When the number of whiteout-ed children is less than the value of +dirwh, aufs remove them in a single systemcall instead of passing +another thread. +This value is ignored when the branch is NFS. +The default value is \*[AUFS_DIRWH_DEF]. +. +.TP +.B plink +.TQ +.B noplink +Specifies to use \[oq]pseudo link\[cq] feature or not. +The default is \[oq]plink\[cq] which means use this feature. +(cf. Pseudo Link) +. +.TP +.B clean_plink +Removes all pseudo-links in memory. +In order to make pseudo-link permanent, use +\[oq]auplink\[cq] utility just before one of these operations, +unmounting aufs, +using \[oq]ro\[cq] or \[oq]noplink\[cq] mount option, +deleting a branch from aufs, +adding a branch into aufs, +or changing your writable branch as readonly. +If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your +mount(8) and umount(8) support them, +\[oq]auplink\[cq] utility will be executed automatically and flush pseudo-links. +(cf. Pseudo Link) +. +.TP +.B udba=none | reval | inotify +Specifies the level of UDBA (User\[aq]s Direct Branch Access) test. +(cf. User\[aq]s Direct Branch Access and Inotify Limitation). +. +.TP +.B diropq=whiteouted | w | always | a +Specifies whether mkdir(2) and rename(2) dir case make the created directory +\[oq]opaque\[cq] or not. +In other words, to create \[oq]\*[AUFS_WH_DIROPQ]\[cq] under the created or renamed +directory, or not to create. +When you specify diropq=w or diropq=whiteouted, aufs will not create +it if the +directory was not whiteouted or opaqued. If the directory was whiteouted +or opaqued, the created or renamed directory will be opaque. +When you specify diropq=a or diropq==always, aufs will always create +it regardless +the directory was whiteouted/opaqued or not. +The default value is diropq=w, it means not to create when it is unnecessary. +If you define CONFIG_AUFS_COMPAT at aufs compiling time, the default will be +diropq=a. +You need to consider this option if you are planning to add a branch later +since \[oq]diropq\[cq] affects the same named directory on the added branch. +. +.TP +.B warn_perm +.TQ +.B nowarn_perm +Adding a branch, aufs will issue a warning about uid/gid/permission of +the adding branch directory, +when they differ from the existing branch\[aq]s. This difference may or +may not impose a security risk. +If you are sure that there is no problem and want to stop the warning, +use \[oq]nowarn_perm\[cq] option. +The default is \[oq]warn_perm\[cq] (cf. DIAGNOSTICS). + +.\" ---------------------------------------------------------------------- +.SH Module Parameters +.TP +.B nwkq=N +The number of kernel thread named \*[AUFS_WKQ_NAME]. + +Those threads stay in the system while the aufs module is loaded, +and handle the special I/O requests from aufs. +The default value is \*[AUFS_NWKQ_DEF]. + +The special I/O requests from aufs include a part of copy-up, lookup, +directory handling, pseudo-link, xino file operations and the +delegated access to branches. +For example, Unix filesystems allow you to rmdir(2) which has no write +permission bit, if its parent directory has write permission bit. In aufs, the +removing directory may or may not have whiteout or \[oq]dir opaque\[cq] mark as its +child. And aufs needs to unlink(2) them before rmdir(2). +Therefore aufs delegates the actual unlink(2) and rmdir(2) to another kernel +thread which has been created already and has a superuser privilege. + +If you enable CONFIG_SYSFS, you can check this value through +/module/aufs/parameters/nwkq. + +. +.TP +.B brs=1 | 0 +Specifies to use the branch path data file under sysfs or not. + +If the number of your branches is large or their path is long +and you meet the limitation of mount(8) ro /etc/mtab, you need to +enable CONFIG_SYSFS and set aufs module parameter brs=1. + +When this parameter is set as 1, aufs does not show \[oq]br:\[cq] (or dirs=) +mount option through /proc/mounts (and /etc/mtab). So you can +keep yourself from the page limitation of +mount(8) or /etc/mtab. +Aufs shows branch paths through /fs/aufs/si_XXX/brNNN. +Actually the file under sysfs has also a size limitation, but I don\[aq]t +think it is harmful. + +There is one more side effect in setting 1 to this parameter. +If you rename your branch, the branch path written in /etc/mtab will be +obsoleted and the future remount will meet some error due to the +unmatched parameters (Remember that mount(8) may take the options from +/etc/mtab and pass them to the systemcall). +If you set 1, /etc/mtab will not hold the branch path and you will not +meet such trouble. On the other hand, the entires for the +branch path under sysfs are generated dynamically. So it must not be obsoleted. +But I don\[aq]t think users want to rename branches so often. + +If CONFIG_SYSFS is disable, this paramater is always set to 0. +. +.TP +.B sysrq=key +Specifies MagicSysRq key for debugging aufs. +You need to enable both of CONFIG_MAGIC_SYSRQ and CONFIG_AUFS_DEBUG. +Currently this is for developers only. +The default is \[oq]a\[cq]. +. +.TP +.B debug= 0 | 1 +Specifies disable(0) or enable(1) debug print in aufs. +This parameter can be changed dynamically. +You need to enable CONFIG_AUFS_DEBUG. +Currently this is for developers only. +The default is \[oq]0\[cq] (disable). + +.\" ---------------------------------------------------------------------- +.SH Entries under Sysfs and Debugfs +See linux/Documentation/ABI/*/{sys,debug}fs-aufs. + +.\" ---------------------------------------------------------------------- +.SH Branch Syntax +.TP +.B dir_path[ =permission [ + attribute ] ] +.TQ +.B permission := rw | ro | rr +.TQ +.B attribute := wh | nolwh +dir_path is a directory path. +The keyword after \[oq]dir_path=\[cq] is a +permission flags for that branch. +Comma, colon and the permission flags string (including \[oq]=\[cq])in the path +are not allowed. + +Any filesystem can be a branch, But some are not accepted such like +sysfs, procfs and unionfs. +If you specify such filesystems as an aufs branch, aufs will return an error +saying it is unsupported. + +Cramfs in linux stable release has strange inodes and it makes aufs +confused. For example, +.nf +$ mkdir -p w/d1 w/d2 +$ > w/z1 +$ > w/z2 +$ mkcramfs w cramfs +$ sudo mount -t cramfs -o ro,loop cramfs /mnt +$ find /mnt -ls + 76 1 drwxr-xr-x 1 jro 232 64 Jan 1 1970 /mnt + 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d1 + 1 1 drwxr-xr-x 1 jro 232 0 Jan 1 1970 /mnt/d2 + 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z1 + 1 1 -rw-r--r-- 1 jro 232 0 Jan 1 1970 /mnt/z2 +.fi + +All these two directories and two files have the same inode with one +as their link count. Aufs cannot handle such inode correctly. +Currently, aufs involves a tiny workaround for such inodes. But some +applications may not work correctly since aufs inode number for such +inode will change silently. +If you do not have any empty files, empty directories or special files, +inodes on cramfs will be all fine. + +A branch should not be shared as the writable branch between multiple +aufs. A readonly branch can be shared. + +The maximum number of branches is configurable at compile time. +The current value is \*[AUFS_BRANCH_MAX] which depends upon +configuration. + +When an unknown permission or attribute is given, aufs sets ro to that +branch silently. + +.SS Permission +. +.TP +.B rw +Readable and writable branch. Set as default for the first branch. +If the branch filesystem is mounted as readonly, you cannot set it \[oq]rw.\[cq] +.\" A filesystem which does not support link(2) and i_op\->setattr(), for +.\" example FAT, will not be used as the writable branch. +. +.TP +.B ro +Readonly branch and it has no whiteouts on it. +Set as default for all branches except the first one. Aufs never issue +both of write operation and lookup operation for whiteout to this branch. +. +.TP +.B rr +Real readonly branch, special case of \[oq]ro\[cq], for natively readonly +branch. Assuming the branch is natively readonly, aufs can optimize +some internal operation. For example, if you specify \[oq]udba=inotify\[cq] +option, aufs does not set inotify for the things on rr branch. +Set by default for a branch whose fs-type is either \[oq]iso9660\[cq], +\[oq]cramfs\[cq] or \[oq]romfs\[cq]. + +When your branch exists on slower device and you have some +capacity on your hdd, you may want to try ulobdev tool in ULOOP sample. +It can cache the contents of the real devices on another faster device, +so you will be able to get the better access performance. +The ulobdev tool is for a generic block device, and the ulohttp is for a +filesystem image on http server. +If you want to spin down your hdd to save the +battery life or something, then you may want to use ulobdev to save the +access to the hdd, too. +See $AufsCVS/sample/uloop in detail. + +.SS Attribute +. +.TP +.B wh +Readonly branch and it has/might have whiteouts on it. +Aufs never issue write operation to this branch, but lookup for whiteout. +Use this as \[oq]=ro+wh\[cq]. +. +.TP +.B nolwh +Usually, aufs creates a whiteout as a hardlink on a writable +branch. This attributes prohibits aufs to create the hardlinked +whiteout, including the source file of all hardlinked whiteout +(\*[AUFS_WH_BASE].) +If you do not like a hardlink, or your writable branch does not support +link(2), then use this attribute. +But I am afraid a filesystem which does not support link(2) natively +will fail in other place such as copy-up. +Use this as \[oq]=rw+nolwh\[cq]. +Also you may want to try \[oq]noplink\[cq] mount option, while it is not recommended. + +.\" .SS FUSE as a branch +.\" A FUSE branch needs special attention. +.\" The struct fuse_operations has a statfs operation. It is OK, but the +.\" parameter is struct statvfs* instead of struct statfs*. So almost +.\" all user\-space implementaion will call statvfs(3)/fstatvfs(3) instead of +.\" statfs(2)/fstatfs(2). +.\" In glibc, [f]statvfs(3) issues [f]statfs(2), open(2)/read(2) for +.\" /proc/mounts, +.\" and stat(2) for the mountpoint. With this situation, a FUSE branch will +.\" cause a deadlock in creating something in aufs. Here is a sample +.\" scenario, +.\" .\" .RS +.\" .\" .IN -10 +.\" .Bu +.\" create/modify a file just under the aufs root dir. +.\" .Bu +.\" aufs aquires a write\-lock for the parent directory, ie. the root dir. +.\" .Bu +.\" A library function or fuse internal may call statfs for a fuse branch. +.\" The create=mfs mode in aufs will surely call statfs for each writable +.\" branches. +.\" .Bu +.\" FUSE in kernel\-space converts and redirects the statfs request to the +.\" user\-space. +.\" .Bu +.\" the user\-space statfs handler will call [f]statvfs(3). +.\" .Bu +.\" the [f]statvfs(3) in glibc will access /proc/mounts and issue +.\" stat(2) for the mountpoint. But those require a read\-lock for the aufs +.\" root directory. +.\" .Bu +.\" Then a deadlock occurs. +.\" .\" .RE 1 +.\" .\" .IN +.\" +.\" In order to avoid this deadlock, I would suggest not to call +.\" [f]statvfs(3) from fuse. Here is a sample code to do this. +.\" .nf +.\" struct statvfs stvfs; +.\" +.\" main() +.\" { +.\" statvfs(..., &stvfs) +.\" or +.\" fstatvfs(..., &stvfs) +.\" stvfs.f_fsid = 0 +.\" } +.\" +.\" statfs_handler(const char *path, struct statvfs *arg) +.\" { +.\" struct statfs stfs +.\" +.\" memcpy(arg, &stvfs, sizeof(stvfs)) +.\" +.\" statfs(..., &stfs) +.\" or +.\" fstatfs(..., &stfs) +.\" +.\" arg->f_bfree = stfs.f_bfree +.\" arg->f_bavail = stfs.f_bavail +.\" arg->f_ffree = stfs.f_ffree +.\" arg->f_favail = /* any value */ +.\" } +.\" .fi + +.\" ---------------------------------------------------------------------- +.SH External Inode Number Bitmap, Translation Table and Generation Table (xino) +Aufs uses one external bitmap file and one external inode number +translation table files per an aufs and per a branch +filesystem by default. +Additionally when CONFIG_AUFS_EXPORT is enabled, one external inode +generation table is added. +The bitmap (and the generation table) is for recycling aufs inode number +and the others +are a table for converting an inode number on a branch to +an aufs inode number. The default path +is \[oq]first writable branch\[cq]/\*[AUFS_XINO_FNAME]. +If there is no writable branch, the +default path +will be \*[AUFS_XINO_DEFPATH]. +.\" A user who executes mount(8) needs the privilege to create xino +.\" file. + +If you enable CONFIG_SYSFS, the path of xino files are not shown in +/proc/mounts (and /etc/mtab), instead it is shown in +/fs/aufs/si_/xi_path. +Otherwise, it is shown in /proc/mounts unless it is not the default +path. + +Those files are always opened and read/write by aufs frequently. +If your writable branch is on flash memory device, it is recommended +to put xino files on other than flash memory by specifying \[oq]xino=\[cq] +mount option. + +The +maximum file size of the bitmap is, basically, the amount of the +number of all the files on all branches divided by 8 (the number of +bits in a byte). +For example, on a 4KB page size system, if you have 32,768 (or +2,599,968) files in aufs world, +then the maximum file size of the bitmap is 4KB (or 320KB). + +The +maximum file size of the table will +be \[oq]max inode number on the branch x size of an inode number\[cq]. +For example in 32bit environment, + +.nf +$ df -i /branch_fs +/dev/hda14 2599968 203127 2396841 8% /branch_fs +.fi + +and /branch_fs is an branch of the aufs. When the inode number is +assigned contiguously (without \[oq]hole\[cq]), the maximum xino file size for +/branch_fs will be 2,599,968 x 4 bytes = about 10 MB. But it might not be +allocated all of disk blocks. +When the inode number is assigned discontinuously, the maximum size of +xino file will be the largest inode number on a branch x 4 bytes. +Additionally, the file size is limited to LLONG_MAX or the s_maxbytes +in filesystem\[aq]s superblock (s_maxbytes may be smaller than +LLONG_MAX). So the +support-able largest inode number on a branch is less than +2305843009213693950 (LLONG_MAX/4\-1). +This is the current limitation of aufs. +On 64bit environment, this limitation becomes more strict and the +supported largest inode number is less than LLONG_MAX/8\-1. + +The xino files are always hidden, i.e. removed. So you cannot +do \[oq]ls \-l xino_file\[cq]. +If you enable CONFIG_DEBUG_FS, you can check these information through +/aufs//{xib,xi[0-9]*,xigen}. xib is for the bitmap file, +xi0 ix for the first branch, and xi1 is for the next. xigen is for the +generation table. +xib and xigen are in the format of, + +.nf +x +.fi + +Note that a filesystem usually has a +feature called pre-allocation, which means a number of +blocks are allocated automatically, and then deallocated +silently when the filesystem thinks they are unnecessary. +You do not have to be surprised the sudden changes of the number of +blocks, when your filesystem which xino files are placed supports the +pre-allocation feature. + +The rests are hidden xino file information in the format of, + +.nf +, x +.fi + +If the file count is larger than 1, it means some of your branches are +on the same filesystem and the xino file is shared by them. +Note that the file size may not be equal to the actual consuming blocks +since xino file is a sparse file, i.e. a hole in a file which does not +consume any disk blocks. + +Once you unmount aufs, the xino files for that aufs are totally gone. +It means that the inode number is not permanent across umount or +shutdown. + +The xino files should be created on the filesystem except NFS. +If your first writable branch is NFS, you will need to specify xino +file path other than NFS. +Also if you are going to remove the branch where xino files exist or +change the branch permission to readonly, you need to use xino option +before del/mod the branch. + +The bitmap file can be truncated. +For example, if you delete a branch which has huge number of files, +many inode numbers will be recycled and the bitmap will be truncated +to smaller size. Aufs does this automatically when a branch is +deleted. +You can truncate it anytime you like if you specify \[oq]trunc_xib\[cq] mount +option. But when the accessed inode number was not deleted, nothing +will be truncated. +If you do not want to truncate it (it may be slow) when you delete a +branch, specify \[oq]notrunc_xib\[cq] after \[oq]del\[cq] mount option. + +If you do not want to use xino, use noxino mount option. Use this +option with care, since the inode number may be changed silently and +unexpectedly anytime. +For example, +rmdir failure, recursive chmod/chown/etc to a large and deep directory +or anything else. +And some applications will not work correctly. +.\" When the inode number has been changed, your system +.\" can be crazy. +If you want to change the xino default path, use xino mount option. + +After you add branches, the persistence of inode number may not be +guaranteed. +At remount time, cached but unused inodes are discarded. +And the newly appeared inode may have different inode number at the +next access time. The inodes in use have the persistent inode number. + +When aufs assigned an inode number to a file, and if you create the +same named file on the upper branch directly, then the next time you +access the file, aufs may assign another inode number to the file even +if you use xino option. +Some applications may treat the file whose inode number has been +changed as totally different file. + +.\" ---------------------------------------------------------------------- +.SH Pseudo Link (hardlink over branches) +Aufs supports \[oq]pseudo link\[cq] which is a logical hard-link over +branches (cf. ln(1) and link(2)). +In other words, a copied-up file by link(2) and a copied-up file which was +hard-linked on a readonly branch filesystem. + +When you have files named fileA and fileB which are +hardlinked on a readonly branch, if you write something into fileA, +aufs copies-up fileA to a writable branch, and write(2) the originally +requested thing to the copied-up fileA. On the writable branch, +fileA is not hardlinked. +But aufs remembers it was hardlinked, and handles fileB as if it existed +on the writable branch, by referencing fileA\[aq]s inode on the writable +branch as fileB\[aq]s inode. + +Once you unmount aufs, the plink info for that aufs kept in memory are totally +gone. +It means that the pseudo-link is not permanent. +If you want to make plink permanent, try \[oq]auplink\[cq] utility just before +one of these operations, +unmounting your aufs, +using \[oq]ro\[cq] or \[oq]noplink\[cq] mount option, +deleting a branch from aufs, +adding a branch into aufs, +or changing your writable branch to readonly. + +This utility will reproduces all real hardlinks on a writable branch by linking +them, and removes pseudo-link info in memory and temporary link on the +writable branch. +Since this utility access your branches directly, you cannot hide them by +\[oq]mount \-\-bind /tmp /branch\[cq] or something. + +If you are willing to rebuild your aufs with the same branches later, you +should use auplink utility before you umount your aufs. +If you installed both of /sbin/mount.aufs and /sbin/umount.aufs, and your +mount(8) and umount(8) support them, +\[oq]auplink\[cq] utility will be executed automatically and flush pseudo-links. + +.nf +# auplink /your/aufs/root flush +# umount /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,mod:/your/writable/branch=ro /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,noplink /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,del:/your/aufs/branch /your/aufs/root +or +# auplink /your/aufs/root flush +# mount -o remount,append:/your/aufs/branch /your/aufs/root +.fi + +The plinks are kept both in memory and on disk. When they consumes too much +resources on your system, you can use the \[oq]auplink\[cq] utility at anytime and +throw away the unnecessary pseudo-links in safe. + +Additionally, the \[oq]auplink\[cq] utility is very useful for some security reasons. +For example, when you have a directory whose permission flags +are 0700, and a file who is 0644 under the 0700 directory. Usually, +all files under the 0700 directory are private and no one else can see +the file. But when the directory is 0711 and someone else knows the 0644 +filename, he can read the file. + +Basically, aufs pseudo-link feature creates a temporary link under the +directory whose owner is root and the permission flags are 0700. +But when the writable branch is NFS, aufs sets 0711 to the directory. +When the 0644 file is pseudo-linked, the temporary link, of course the +contents of the file is totally equivalent, will be created under the +0711 directory. The filename will be generated by its inode number. +While it is hard to know the generated filename, someone else may try peeping +the temporary pseudo-linked file by his software tool which may try the name +from one to MAX_INT or something. +In this case, the 0644 file will be read unexpectedly. +I am afraid that leaving the temporary pseudo-links can be a security hole. +It makes sense to execute \[oq]auplink /your/aufs/root flush\[cq] +periodically, when your writable branch is NFS. + +When your writable branch is not NFS, or all users are careful enough to set 0600 +to their private files, you do not have to worry about this issue. + +If you do not want this feature, use \[oq]noplink\[cq] mount option. + +.SS The behaviours of plink and noplink +This sample shows that the \[oq]f_src_linked2\[cq] with \[oq]noplink\[cq] option cannot follow +the link. + +.nf +none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,br:/dev/shm/rw=rw:/dev/shm/ro=ro) +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +ls: ./copied: No such file or directory +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked +22 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 +$ echo abc >> f_src_linked +$ cp f_src_linked copied +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +15 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +36 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ../rw/f_src_linked +53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied +22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked +22 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 +$ cmp copied f_src_linked2 +$ + +none on /dev/shm/u type aufs (rw,xino=/dev/shm/rw/.aufs.xino,noplink,br:/dev/shm/rw=rw:/dev/shm/ro=ro) +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +ls: ./copied: No such file or directory +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked +23 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ./f_src_linked2 +$ echo abc >> f_src_linked +$ cp f_src_linked copied +$ ls -li ../r?/f_src_linked* ./f_src_linked* ./copied +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked +17 -rw-r--r-- 2 jro jro 2 Dec 22 11:03 ../ro/f_src_linked2 +36 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ../rw/f_src_linked +53 -rw-r--r-- 1 jro jro 6 Dec 22 11:03 ./copied +23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked +23 -rw-r--r-- 2 jro jro 6 Dec 22 11:03 ./f_src_linked2 +$ cmp copied f_src_linked2 +cmp: EOF on f_src_linked2 +$ +.fi + +.\" +.\" If you add/del a branch, or link/unlink the pseudo-linked +.\" file on a branch +.\" directly, aufs cannot keep the correct link count, but the status of +.\" \[oq]pseudo-linked.\[cq] +.\" Those files may or may not keep the file data after you unlink the +.\" file on the branch directly, especially the case of your branch is +.\" NFS. + +If you add a branch which has fileA or fileB, aufs does not follow the +pseudo link. The file on the added branch has no relation to the same +named file(s) on the lower branch(es). +If you use noxino mount option, pseudo link will not work after the +kernel shrinks the inode cache. + +This feature will not work for squashfs before version 3.2 since its +inode is tricky. +When the inode is hardlinked, squashfs inodes has the same inode +number and correct link count, but the inode memory object is +different. Squashfs inodes (before v3.2) are generated for each, even +they are hardlinked. + +.\" ---------------------------------------------------------------------- +.SH User\[aq]s Direct Branch Access (UDBA) +UDBA means a modification to a branch filesystem manually or directly, +e.g. bypassing aufs. +While aufs is designed and implemented to be safe after UDBA, +it can make yourself and your aufs confused. And some information like +aufs inode will be incorrect. +For example, if you rename a file on a branch directly, the file on +aufs may +or may not be accessible through both of old and new name. +Because aufs caches various information about the files on +branches. And the cache still remains after UDBA. + +Aufs has a mount option named \[oq]udba\[cq] which specifies the test level at +access time whether UDBA was happened or not. +. +.TP +.B udba=none +Aufs trusts the dentry and the inode cache on the system, and never +test about UDBA. With this option, aufs runs fastest, but it may show +you incorrect data. +Additionally, if you often modify a branch +directly, aufs will not be able to trace the changes of inodes on the +branch. It can be a cause of wrong behaviour, deadlock or anything else. + +It is recommended to use this option only when you are sure that +nobody access a file on a branch. +It might be difficult for you to achieve real \[oq]no UDBA\[cq] world when you +cannot stop your users doing \[oq]find / \-ls\[cq] or something. +If you really want to forbid all of your users to UDBA, here is a trick +for it. +With this trick, users cannot see the +branches directly and aufs runs with no problem, except \[oq]auplink\[cq] utility. +But if you are not familiar with aufs, this trick may make +yourself confused. + +.nf +# d=/tmp/.aufs.hide +# mkdir $d +# for i in $branches_you_want_to_hide +> do +> mount -n --bind $d $i +> done +.fi + +When you unmount the aufs, delete/modify the branch by remount, or you +want to show the hidden branches again, unmount the bound +/tmp/.aufs.hide. + +.nf +# umount -n $branches_you_want_to_unbound +.fi + +If you use FUSE filesystem as an aufs branch which supports hardlink, +you should not set this option, since FUSE makes inode objects for +each hardlinks (at least in linux\-2.6.23). When your FUSE filesystem +maintains them at link/unlinking, it is equivalent +to \[oq]direct branch access\[cq] for aufs. + +. +.TP +.B udba=reval +Aufs tests only the existence of the file which existed. If +the existed file was removed on the branch directly, aufs +discard the cache about the file and +re-lookup it. So the data will be updated. +This test is at minimum level to keep the performance and ensure the +existence of a file. +This is default and aufs runs still fast. + +This rule leads to some unexpected situation, but I hope it is +harmless. Those are totally depends upon cache. Here are just a few +examples. +. +.RS +.Bu +If the file is cached as negative or +not-existed, aufs does not test it. And the file is still handled as +negative after a user created the file on a branch directly. If the +file is not cached, aufs will lookup normally and find the file. +. +.Bu +When the file is cached as positive or existed, and a user created the +same named file directly on the upper branch. Aufs detects the cached +inode of the file is still existing and will show you the old (cached) +file which is on the lower branch. +. +.Bu +When the file is cached as positive or existed, and a user renamed the +file by rename(2) directly. Aufs detects the inode of the file is +still existing. You may or may not see both of the old and new files. +Todo: If aufs also tests the name, we can detect this case. +.RE + +If your outer modification (UDBA) is rare and you can ignore the +temporary and minor differences between virtual aufs world and real +branch filesystem, then try this mount option. +. +.TP +.B udba=inotify +Aufs sets `inotify' to all the accessed directories on its branches +and receives the event about the dir and its children. It consumes +resources, cpu and memory. And I am afraid that the performance will be +hurt, but it is most strict test level. +There are some limitations of linux inotify, see also Inotify +Limitation. +So it is recommended to leave udba default option usually, and set it +to inotify by remount when you need it. + +When a user accesses the file which was notified UDBA before, the cached data +about the file will be discarded and aufs re-lookup it. So the data will +be updated. +When an error condition occurs between UDBA and aufs operation, aufs +will return an error, including EIO. +To use this option, you need to enable CONFIG_INOTIFY and +CONFIG_AUFS_UDBA_INOTIFY. + +To rename/rmdir a directory on a branch directory may reveal the same named +directory on the lower branch. Aufs tries re-lookuping the renamed +directory and the revealed directory and assigning different inode +number to them. But the inode number including their children can be a +problem. The inode numbers will be changed silently, and +aufs may produce a warning. If you rename a directory repeatedly and +reveal/hide the lower directory, then aufs may confuse their inode +numbers too. It depends upon the system cache. + +When you make a directory in aufs and mount other filesystem on it, +the directory in aufs cannot be removed expectedly because it is a +mount point. But the same named directory on the writable branch can +be removed, if someone wants. It is just an empty directory, instead +of a mount point. +Aufs cannot stop such direct rmdir, but produces a warning about it. + +If the pseudo-linked file is hardlinked or unlinked on the branch +directly, its inode link count in aufs may be incorrect. It is +recommended to flush the psuedo-links by auplink script. + +.\" ---------------------------------------------------------------------- +.SH Linux Inotify Limitation +Unfortunately, current inotify (linux\-2.6.18) has some limitations, +and aufs must derive it. + +.SS IN_ATTRIB, updating atime +When a file/dir on a branch is accessed directly, the inode atime (access +time, cf. stat(2)) may or may not be updated. In some cases, inotify +does not fire this event. So the aufs inode atime may remain old. + +.SS IN_ATTRIB, updating nlink +When the link count of a file on a branch is incremented by link(2) +directly, +inotify fires IN_CREATE to the parent +directory, but IN_ATTRIB to the file. So the aufs inode nlink may +remain old. + +.SS IN_DELETE, removing file on NFS +When a file on a NFS branch is deleted directly, inotify may or may +not fire +IN_DELETE event. It depends upon the status of dentry +(DCACHE_NFSFS_RENAMED flag). +In this case, the file on aufs seems still exists. Aufs and any user can see +the file. + +.SS IN_IGNORED, deleted rename target +When a file/dir on a branch is unlinked by rename(2) directly, inotify +fires IN_IGNORED which means the inode is deleted. Actually, in some +cases, the inode survives. For example, the rename target is linked or +opened. In this case, inotify watch set by aufs is removed by VFS and +inotify. +And aufs cannot receive the events anymore. So aufs may show you +incorrect data about the file/dir. + +.\" ---------------------------------------------------------------------- +.SH Copy On Write, or aufs internal copyup and copydown +Every stackable filesystem which implements copy\-on\-write supports the +copyup feature. The feature is to copy a file/dir from the lower branch +to the upper internally. When you have one readonly branch and one +upper writable branch, and you append a string to a file which exists on +the readonly branch, then aufs will copy the file from the readonly +branch to the writable branch with its directory hierarchy. It means one +write(2) involves several logical/internal mkdir(2), creat(2), read(2), +write(2) and close(2) systemcalls +before the actual expected write(2) is performed. Sometimes it may take +a long time, particulary when the file is very large. +If CONFIG_AUFS_DEBUG is enabled, aufs produces a message saying `copying +a large file.\[aq] + +You may see the message when you change the xino file path or +truncate the xino/xib files. Sometimes those files can be large and may +take a long time to handle them. + +.\" ---------------------------------------------------------------------- +.SH Policies to Select One among Multiple Writable Branches +Aufs has some policies to select one among multiple writable branches +when you are going to write/modify something. There are two kinds of +policies, one is for newly create something and the other is for +internal copy-up. +You can select them by specifying mount option \[oq]create=CREATE_POLICY\[cq] +or \[oq]cpup=COPYUP_POLICY.\[cq] +These policies have no meaning when you have only one writable +branch. If there is some meaning, it must hurt the performance. + +.SS Exceptions for Policies +In every cases below, even if the policy says that the branch where a +new file should be created is /rw2, the file will be created on /rw1. +. +.Bu +If there is a readonly branch with \[oq]wh\[cq] attribute above the +policy-selected branch and the parent dir is marked as opaque, +or the target (creating) file is whiteouted on the ro+wh branch, then +the policy will be ignored and the target file will be created on the +nearest upper writable branch than the ro+wh branch. +.RS +.nf +/aufs = /rw1 + /ro+wh/diropq + /rw2 +/aufs = /rw1 + /ro+wh/wh.tgt + /rw2 +.fi +.RE +. +.Bu +If there is a writable branch above the policy-selected branch and the +parent dir is marked as opaque or the target file is whiteouted on the +branch, then the policy will be ignored and the target file will be +created on the highest one among the upper writable branches who has +diropq or whiteout. In case of whiteout, aufs removes it as usual. +.RS +.nf +/aufs = /rw1/diropq + /rw2 +/aufs = /rw1/wh.tgt + /rw2 +.fi +.RE +. +.Bu +link(2) and rename(2) systemcalls are exceptions in every policy. +They try selecting the branch where the source exists as possible since +copyup a large file will take long time. If it can\[aq]t be, ie. the +branch where the source exists is readonly, then they will follow the +copyup policy. +. +.Bu +There is an exception for rename(2) when the target exists. +If the rename target exists, aufs compares the index of the branches +where the source and the target are existing and selects the higher +one. If the selected branch is readonly, then aufs follows the copyup +policy. + +.SS Policies for Creating +. +.TP +.B create=tdp | top\-down\-parent +Selects the highest writable branch where the parent dir exists. If +the parent dir does not exist on a writable branch, then the internal +copyup will happen. The policy for this copyup is always \[oq]bottom-up.\[cq] +This is the default policy. +. +.TP +.B create=rr | round\-robin +Selects a writable branch in round robin. When you have two writable +branches and creates 10 new files, 5 files will be created for each +branch. +mkdir(2) systemcall is an exception. When you create 10 new directories, +all are created on the same branch. +. +.TP +.B create=mfs[:second] | most\-free\-space[:second] +Selects a writable branch which has most free space. In order to keep +the performance, you can specify the duration (\[oq]second\[cq]) which makes +aufs hold the index of last selected writable branch until the +specified seconds expires. The first time you create something in aufs +after the specified seconds expired, aufs checks the amount of free +space of all writable branches by internal statfs call +and the held branch index will be updated. +The default value is \*[AUFS_MFS_SECOND_DEF] seconds. +. +.TP +.B create=mfsrr:low[:second] +Selects a writable branch in most-free-space mode first, and then +round-robin mode. If the selected branch has less free space than the +specified value \[oq]low\[cq] in bytes, then aufs re-tries in round-robin mode. +.\" \[oq]G\[cq], \[oq]M\[cq] and \[oq]K\[cq] (case insensitive) can be followed after \[oq]low.\[cq] Or +Try an arithmetic expansion of shell which is defined by POSIX. +For example, $((10 * 1024 * 1024)) for 10M. +You can also specify the duration (\[oq]second\[cq]) which is equivalent to +the \[oq]mfs\[cq] mode. +. +.TP +.B create=pmfs[:second] +Selects a writable branch where the parent dir exists, such as tdp +mode. When the parent dir exists on multiple writable branches, aufs +selects the one which has most free space, such as mfs mode. + +.SS Policies for Copy-Up +. +.TP +.B cpup=tdp | top\-down\-parent +Equivalent to the same named policy for create. +This is the default policy. +. +.TP +.B cpup=bup | bottom\-up\-parent +Selects the writable branch where the parent dir exists and the branch +is nearest upper one from the copyup-source. +. +.TP +.B cpup=bu | bottom\-up +Selects the nearest upper writable branch from the copyup-source, +regardless the existence of the parent dir. + +.\" ---------------------------------------------------------------------- +.SH Exporting Aufs via NFS +Aufs is supporting NFS-exporting. +Since aufs has no actual block device, you need to add NFS \[oq]fsid\[cq] option at +exporting. Refer to the manual of NFS about the detail of this option. + +There are some limitations or requirements. +.RS +.Bu +The branch filesystem must support NFS-exporting. +.Bu +NFSv2 is not supported. When you mount the exported aufs from your NFS +client, you will need to some NFS options like v3 or nfsvers=3, +especially if it is nfsroot. +.Bu +If the size of the NFS file handle on your branch filesystem is large, +aufs will +not be able to handle it. The maximum size of NFSv3 file +handle for a filesystem is 64 bytes. Aufs uses 24 bytes for 32bit +system, plus 12 bytes for 64bit system. The rest is a room for a file +handle of a branch filesystem. +.Bu +The External Inode Number Bitmap, Translation Table and Generation Table +(xino) is +required since NFS file +handle is based upon inode number. The mount option \[oq]xino\[cq] is enabled +by default. +The external inode generation table and its debugfs entry +(/aufs/si_*/xigen) is created when CONFIG_AUFS_EXPORT is +enabled even if you don\[aq]t export aufs actually. +The size of the external inode generation table grows only, never be +truncated. You might need to pay attention to the free space of the +filesystem where xino files are placed. By default, it is the first +writable branch. +.Bu +The branch filesystems must be accessible, which means \[oq]not hidden.\[cq] +It means you need to \[oq]mount \-\-move\[cq] when you use initramfs and +switch_root(8), or chroot(8). +.RE + +.\" ---------------------------------------------------------------------- +.SH Dentry and Inode Caches +If you want to clear caches on your system, there are several tricks +for that. If your system ram is low, +try \[oq]find /large/dir \-ls > /dev/null\[cq]. +It will read many inodes and dentries and cache them. Then old caches will be +discarded. +But when you have large ram or you do not have such large +directory, it is not effective. + +If you want to discard cache within a certain filesystem, +try \[oq]mount \-o remount /your/mntpnt\[cq]. Some filesystem may return an error of +EINVAL or something, but VFS discards the unused dentry/inode caches on the +specified filesystem. + +.\" ---------------------------------------------------------------------- +.SH Compatible/Incompatible with Unionfs Version 1.x Series +If you compile aufs with \-DCONFIG_AUFS_COMPAT, dirs= option and =nfsro +branch permission flag are available. They are interpreted as +br: option and =ro flags respectively. + \[oq]debug\[cq], \[oq]delete\[cq], \[oq]imap\[cq] options are ignored silently. When you +compile aufs without \-DCONFIG_AUFS_COMPAT, these three options are +also ignored, but a warning message is issued. + +Ignoring \[oq]delete\[cq] option, and to keep filesystem consistency, aufs tries +writing something to only one branch in a single systemcall. It means +aufs may copyup even if the copyup-src branch is specified as writable. +For example, you have two writable branches and a large regular file +on the lower writable branch. When you issue rename(2) to the file on aufs, +aufs may copyup it to the upper writable branch. +If this behaviour is not what you want, then you should rename(2) it +on the lower branch directly. + +And there is a simple shell +script \[oq]unionctl\[cq] under sample subdirectory, which is compatible with +unionctl(8) in +Unionfs Version 1.x series, except \-\-query action. +This script executes mount(8) with \[oq]remount\[cq] option and uses +add/del/mod aufs mount options. +If you are familiar with Unionfs Version 1.x series and want to use unionctl(8), you can +try this script instead of using mount \-o remount,... directly. +Aufs does not support ioctl(2) interface. +This script is highly depending upon mount(8) in +util\-linux\-2.12p package, and you need to mount /proc to use this script. +If your mount(8) version differs, you can try modifying this +script. It is very easy. +The unionctl script is just for a sample usage of aufs remount +interface. + +Aufs uses the external inode number bitmap and translation table by +default. + +The default branch permission for the first branch is \[oq]rw\[cq], and the +rest is \[oq]ro.\[cq] + +The whiteout is for hiding files on lower branches. Also it is applied +to stop readdir going lower branches. +The latter case is called \[oq]opaque directory.\[cq] Any +whiteout is an empty file, it means whiteout is just an mark. +In the case of hiding lower files, the name of whiteout is +\[oq]\*[AUFS_WH_PFX].\[cq] +And in the case of stopping readdir, the name is +\[oq]\*[AUFS_WH_PFX]\*[AUFS_WH_PFX].opq\[cq] or +\[oq]\*[AUFS_WH_PFX]__dir_opaque.\[cq] The name depends upon your compile +configuration +CONFIG_AUFS_COMPAT. +.\" All of newly created or renamed directory will be opaque. +All whiteouts are hardlinked, +including \[oq]/\*[AUFS_WH_BASE].\[cq] + +The hardlink on an ordinary (disk based) filesystem does not +consume inode resource newly. But in linux tmpfs, the number of free +inodes will be decremented by link(2). It is recommended to specify +nr_inodes option to your tmpfs if you meet ENOSPC. Use this option +after checking by \[oq]df \-i.\[cq] + +When you rmdir or rename-to the dir who has a number of whiteouts, +aufs rename the dir to the temporary whiteouted-name like +\[oq]\*[AUFS_WH_PFX]..\[cq] Then remove it after actual operation. +cf. mount option \[oq]dirwh.\[cq] + +.\" ---------------------------------------------------------------------- +.SH Incompatible with an Ordinary Filesystem +stat(2) returns the inode info from the first existence inode among +the branches, except the directory link count. +Aufs computes the directory link count larger than the exact value usually, in +order to keep UNIX filesystem semantics, or in order to shut find(1) mouth up. +The size of a directory may be wrong too, but it has to do no harm. +The timestamp of a directory will not be updated when a file is +created or removed under it, and it was done on a lower branch. + +The test for permission bits has two cases. One is for a directory, +and the other is for a non-directory. In the case of a directory, aufs +checks the permission bits of all existing directories. It means you +need the correct privilege for the directories including the lower +branches. +The test for a non-directory is more simple. It checks only the +topmost inode. + +statfs(2) returns the information of the first branch info except +namelen when \[oq]nosum\[cq] is specified (the default). The namelen is +decreased by the whiteout prefix length. And the block size may differ +from st_blksize which is obtained by stat(2). + +Remember, seekdir(3) and telldir(3) are not defined in POSIX. They may +not work as you expect. Try rewinddir(3) or re-open the dir. + +The whiteout prefix (\*[AUFS_WH_PFX]) is reserved on all branches. Users should +not handle the filename begins with this prefix. +In order to future whiteout, the maxmum filename length is limited by +the longest value \- \*[AUFS_WH_PFX_LEN]. It may be a violation of POSIX. + +If you dislike the difference between the aufs entries in /etc/mtab +and /proc/mounts, and if you are using mount(8) in util\-linux package, +then try ./mount.aufs utility. Copy the script to /sbin/mount.aufs. +This simple utility tries updating +/etc/mtab. If you do not care about /etc/mtab, you can ignore this +utility. +Remember this utility is highly depending upon mount(8) in +util\-linux\-2.12p package, and you need to mount /proc. + +Since aufs uses its own inode and dentry, your system may cache huge +number of inodes and dentries. It can be as twice as all of the files +in your union. +It means that unmounting or remounting readonly at shutdown time may +take a long time, since mount(2) in VFS tries freeing all of the cache +on the target filesystem. + +When you open a directory, aufs will open several directories +internally. +It means you may reach the limit of the number of file descriptor. +And when the lower directory cannot be opened, aufs will close all the +opened upper directories and return an error. + +The sub-mount under the branch +of local filesystem +is ignored. +For example, if you have mount another filesystem on +/branch/another/mntpnt, the files under \[oq]mntpnt\[cq] will be ignored by aufs. +It is recommended to mount the sub-mount under the mounted aufs. +For example, + +.nf +# sudo mount /dev/sdaXX /ro_branch +# d=another/mntpnt +# sudo mount /dev/sdbXX /ro_branch/$d +# mkdir -p /rw_branch/$d +# sudo mount -t aufs -o br:/rw_branch:/ro_branch none /aufs +# sudo mount -t aufs -o br:/rw_branch/${d}:/ro_branch/${d} none /aufs/another/$d +.fi + +There are several characters which are not allowed to use in a branch +directory path and xino filename. See detail in Branch Syntax and Mount +Option. + +The file-lock which means fcntl(2) with F_SETLK, F_SETLKW or F_GETLK, flock(2) +and lockf(3), is applied to virtual aufs file only, not to the file on a +branch. It means you can break the lock by accessing a branch directly. +TODO: check \[oq]security\[cq] to hook locks, as inotify does. + +The I/O to the named pipe or local socket are not handled by aufs, even +if it exists in aufs. After the reader and the writer established their +connection if the pipe/socket are copied-up, they keep using the old one +instead of the copied-up one. + +The fsync(2) and fdatasync(2) systemcalls return 0 which means success, even +if the given file descriptor is not opened for writing. +I am afraid this behaviour may violate some standards. Checking the +behaviour of fsync(2) on ext2, aufs decided to return success. + +If you want to use disk-quota, you should set it up to your writable +branch since aufs does not have its own block device. + +When your aufs is the root directory of your system, and your system +tells you some of the filesystem were not unmounted cleanly, try these +procedure when you shutdown your system. +.nf +# mount -no remount,ro / +# for i in $writable_branches +# do mount -no remount,ro $i +# done +.fi +If your xino file is on a hard drive, you also need to specify +\[oq]noxino\[cq] option or \[oq]xino=/your/tmpfs/xino\[cq] at remounting root +directory. + +To rename(2) directory may return EXDEV even if both of src and tgt +are on the same aufs. When the rename-src dir exists on multiple +branches and the lower dir has child(ren), aufs has to copyup all his +children. It can be recursive copyup. Current aufs does not support +such huge copyup operation at one time in kernel space, instead +produces a warning and returns EXDEV. +Generally, mv(1) detects this error and tries mkdir(2) and +rename(2) or copy/unlink recursively. So the result is harmless. +If your application which issues rename(2) for a directory does not +support EXDEV, it will not work on aufs. +Also this specification is applied to the case when the src directroy +exists on the lower readonly branch and it has child(ren). + +If a sudden accident such like a power failure happens during aufs is +performing, and regular fsck for branch filesystems is completed after +the disaster, you need to extra fsck for aufs writable branches. It is +necessary to check whether the whiteout remains incorrectly or not, +eg. the real filename and the whiteout for it under the same parent +directory. If such whiteout remains, aufs cannot handle the file +correctly. +To check the consistency from the aufs\[aq] point of view, you can use a +simple shell script called /sbin/auchk. Its purpose is a fsck tool for +aufs, and it checks the illegal whiteout, the remained +pseudo-links and the remained aufs-temp files. If they are found, the +utility reports you and asks whether to delete or not. +It is recommended to execute /sbin/auchk for every writable branch +filesystem before mouting aufs if the system experienced crash. + + +.\" ---------------------------------------------------------------------- +.SH EXAMPLES +The mount options are interpreted from left to right at remount-time. +These examples +shows how the options are handled. (assuming /sbin/mount.aufs was +installed) + +.nf +# mount -v -t aufs br:/day0:/base none /u +none on /u type aufs (rw,xino=/day0/.aufs.xino,br:/day0=rw:/base=ro) +# mount -v -o remount,\\ + prepend:/day1,\\ + xino=/day1/xino,\\ + mod:/day0=ro,\\ + del:/day0 \\ + /u +none on /u type aufs (rw,xino=/day1/xino,br:/day1=rw:/base=ro) +.fi + +.nf +# mount -t aufs br:/rw none /u +# mount -o remount,append:/ro /u +different uid/gid/permission, /ro +# mount -o remount,del:/ro /u +# mount -o remount,nowarn_perm,append:/ro /u +# +(there is no warning) +.fi + +.\" If you want to expand your filesystem size, aufs may help you by +.\" adding an writable branch. Since aufs supports multiple writable +.\" branches, the old writable branch can be being writable, if you want. +.\" In this example, any modifications to the files under /ro branch will +.\" be copied-up to /new, but modifications to the files under /rw branch +.\" will not. +.\" And the next example shows the modifications to the files under /rw branch +.\" will be copied-up to /new/a. +.\" +.\" Todo: test multiple writable branches policy. cpup=nearest, cpup=exist_parent. +.\" +.\" .nf +.\" # mount -v -t aufs br:/rw:/ro none /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/ro=ro) +.\" # mkfs /new +.\" # mount -v -o remount,add:1:/new=rw /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/new=rw:/ro=ro) +.\" .fi +.\" +.\" .nf +.\" # mount -v -t aufs br:/rw:/ro none /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/rw=rw:/ro=ro) +.\" # mkfs /new +.\" # mkdir /new/a new/b +.\" # mount -v -o remount,add:1:/new/b=rw,prepend:/new/a,mod:/rw=ro /u +.\" none on /u type aufs (rw,xino=/rw/.aufs.xino,br:/new/a=rw:/rw=ro:/new/b=rw:/ro=ro) +.\" .fi + +When you use aufs as root filesystem, it is recommended to consider to +exclude some directories. For example, /tmp and /var/log are not need +to stack in many cases. They do not usually need to copyup or to whiteout. +Also the swapfile on aufs (a regular file, not a block device) is not +supported. +In order to exclude the specific dir from aufs, try bind mounting. + +And there is a good sample which is for network booted diskless machines. See +sample/ in detail. + +.\" ---------------------------------------------------------------------- +.SH DIAGNOSTICS +When you add a branch to your union, aufs may warn you about the +privilege or security of the branch, which is the permission bits, +owner and group of the top directory of the branch. +For example, when your upper writable branch has a world writable top +directory, +a malicious user can create any files on the writable branch directly, +like copyup and modify manually. I am afraid it can be a security +issue. + +When you mount or remount your union without \-o ro common mount option +and without writable branch, aufs will warn you that the first branch +should be writable. + +.\" It is discouraged to set both of \[oq]udba\[cq] and \[oq]noxino\[cq] mount options. In +.\" this case the inode number under aufs will always be changed and may +.\" reach the end of inode number which is a maximum of unsigned long. If +.\" the inode number reaches the end, aufs will return EIO repeatedly. + +When you set udba other than inotify and change something on your +branch filesystem directly, later aufs may detect some mismatches to +its cache. If it is a critical mismatch, aufs returns EIO. + +When an error occurs in aufs, aufs prints the kernel message with +\[oq]errno.\[cq] The priority of the message (log level) is ERR or WARNING which +depends upon the message itself. +You can convert the \[oq]errno\[cq] into the error message by perror(3), +strerror(3) or something. +For example, the \[oq]errno\[cq] in the message \[oq]I/O Error, write failed (\-28)\[cq] +is 28 which means ENOSPC or \[oq]No space left on device.\[cq] + +When CONFIG_AUFS_BR_RAMFS is enabled, you can specify ramfs as an aufs +branch. Since ramfs is simple, it doesn't set the maximum link count +originally. In aufs, it is very dangerous, particulary for +whiteouts. Finally aufs sets the maxmimum link count for ramfs. Tha +value is 32000 which is borrowed from ext2. + + +.\" .SH Current Limitation +. +.\" ---------------------------------------------------------------------- +.\" SYNOPSIS +.\" briefly describes the command or function\[aq]s interface. For commands, this +.\" shows the syntax of the command and its arguments (including options); bold- +.\" face is used for as-is text and italics are used to indicate replaceable +.\" arguments. Brackets ([]) surround optional arguments, vertical bars (|) sep- +.\" arate choices, and ellipses (...) can be repeated. For functions, it shows +.\" any required data declarations or #include directives, followed by the func- +.\" tion declaration. +. +.\" DESCRIPTION +.\" gives an explanation of what the command, function, or format does. Discuss +.\" how it interacts with files and standard input, and what it produces on +.\" standard output or standard error. Omit internals and implementation +.\" details unless they\[aq]re critical for understanding the interface. Describe +.\" the usual case; for information on options use the OPTIONS section. If +.\" there is some kind of input grammar or complex set of subcommands, consider +.\" describing them in a separate USAGE section (and just place an overview in +.\" the DESCRIPTION section). +. +.\" RETURN VALUE +.\" gives a list of the values the library routine will return to the caller and +.\" the conditions that cause these values to be returned. +. +.\" EXIT STATUS +.\" lists the possible exit status values or a program and the conditions that +.\" cause these values to be returned. +. +.\" USAGE +.\" describes the grammar of any sublanguage this implements. +. +.\" FILES +.\" lists the files the program or function uses, such as configuration files, +.\" startup files, and files the program directly operates on. Give the full +.\" pathname of these files, and use the installation process to modify the +.\" directory part to match user preferences. For many programs, the default +.\" installation location is in /usr/local, so your base manual page should use +.\" /usr/local as the base. +. +.\" ENVIRONMENT +.\" lists all environment variables that affect your program or function and how +.\" they affect it. +. +.\" SECURITY +.\" discusses security issues and implications. Warn about configurations or +.\" environments that should be avoided, commands that may have security impli- +.\" cations, and so on, especially if they aren\[aq]t obvious. Discussing security +.\" in a separate section isn\[aq]t necessary; if it\[aq]s easier to understand, place +.\" security information in the other sections (such as the DESCRIPTION or USAGE +.\" section). However, please include security information somewhere! +. +.\" CONFORMING TO +.\" describes any standards or conventions this implements. +. +.\" NOTES +.\" provides miscellaneous notes. +. +.\" BUGS +.\" lists limitations, known defects or inconveniences, and other questionable +.\" activities. + +.SH COPYRIGHT +Copyright \(co 2005\-2009 Junjiro R. Okajima + +.SH AUTHOR +Junjiro R. Okajima + +.\" SEE ALSO +.\" lists related man pages in alphabetical order, possibly followed by other +.\" related pages or documents. Conventionally this is the last section. diff -urN ./Documentation/filesystems/aufs/design/01intro.txt ./Documentation/filesystems/aufs/design/01intro.txt --- ./Documentation/filesystems/aufs/design/01intro.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/01intro.txt 2009-04-06 10:26:45.000000000 +0200 @@ -0,0 +1,128 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +Introduction +---------------------------------------- + +aufs [ei ju: ef es] | [a u f s] +1. abbrev. for "advanced multi-layered unification filesystem". +2. abbrev. for "another unionfs". +3. abbrev. for "auf das" in German which means "on the" in English. + Ex. "Butter aufs Brot"(G) means "butter onto bread"(E). + But "Filesystem aufs Filesystem" is hard to understand. + +AUFS is a filesystem with features: +- multi layered stackable unification filesystem, the member directory + is called as a branch. +- branch permission and attribute, 'readonly', 'real-readonly', + 'readwrite', 'whiteout-able', 'link-able whiteout' and their + combination. +- internal "file copy-on-write". +- logical deletion, whiteout. +- dynamic branch manipulation, adding, deleting and changing permission. +- allow bypassing aufs, user's direct branch access. +- external inode number translation table and bitmap which maintains the + persistent aufs inode number. +- seekable directory, including NFS readdir. +- file mapping, mmap and sharing pages. +- pseudo-link, hardlink over branches. +- loopback mounted filesystem as a branch. +- several policies to select one among multiple writable branches. +- revert a single systemcall when an error occurs in aufs. +- and more... + + +Multi Layered Stackable Unification Filesystem +---------------------------------------------------------------------- +Most people already knows what it is. +It is a filesystem which unifies several directories and provides a +merged single directory. When users access a file, the access will be +passed/re-directed/converted (sorry, I am not sure which English word is +correct) to the real file on the member filesystem. The member +filesystem is called 'lower filesystem' or 'branch' and has a mode +'readonly' and 'readwrite.' And the deletion for a file on the lower +readonly branch is handled by creating 'whiteout' on the upper writable +branch. + +On LKML, there have been discussions about UnionMount (Jan Blunck and +Bharata B Rao) and Unionfs (Erez Zadok). They took different approaches +to implement the merged-view. +The former tries putting it into VFS, and the latter implements as a +separate filesystem. +(If I misunderstand about these implementations, please let me know and +I shall correct it. Because it is a long time ago when I read their +source files last time). +UnionMount's approach will be able to small, but may be hard to share +branches between several UnionMount since the whiteout in it is +implemented in the inode on branch filesystem and always +shared. According to Bharata's post, readdir does not seems to be +finished yet. +Unionfs has a longer history. When I started implementing a stacking filesystem +(Aug 2005), it already existed. It has virtual super_block, inode, +dentry and file objects and they have an array pointing lower same kind +objects. After contributing many patches for Unionfs, I re-started my +project AUFS (Jun 2006). + +In AUFS, the structure of filesystem resembles to Unionfs, but I +implemented my own ideas, approaches and enhancements and it became +totally different one. + + +Several characters/aspects of aufs +---------------------------------------------------------------------- + +Aufs has several characters or aspects. +1. a filesystem, callee of VFS helper +2. sub-VFS, caller of VFS helper for branches +3. a virtual filesystem which maintains persistent inode number +4. reader/writer of files on branches such like an application + +1. Caller of VFS Helper +As an ordinary linux filesystem, aufs is a callee of VFS. For instance, +unlink(2) from an application reaches sys_unlink() kernel function and +then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it +calls filesystem specific unlink operation. Actually aufs implements the +unlink operation but it behaves like a redirector. + +2. Caller of VFS Helper for Branches +aufs_unlink() passes the unlink request to the branch filesystem as if +it were called from VFS. So the called unlink operation of the branch +filesystem acts as usual. As a caller of VFS helper, aufs should handle +every necessary pre/post operation for the branch filesystem. +- acquire the lock for the parent dir on a branch +- lookup in a branch +- revalidate dentry on a branch +- mnt_want_write() for a branch +- vfs_unlink() for a branch +- mnt_drop_write() for a branch +- release the lock on a branch + +3. Persistent Inode Number +One of the most important issue for a filesystem is to maintain inode +numbers. This is particularly important to support exporting a +filesystem via NFS. Aufs is a virtual filesystem which doesn't have a +backend block device for its own. But some storage is necessary to +maintain inode number. It may be a large space and may not suit to keep +in memory. Aufs rents some space from its first writable branch +filesystem (by default) and creates file(s) on it. These files are +created by aufs internally and removed soon (currently) keeping opened. +Note: Because these files are removed, they are totally gone after + unmounting aufs. It means the inode numbers are not persistent + across unmount or reboot. I have a plan to make them really + persistent which will be important for aufs on NFS server. + +4. Read/Write Files Internally (copy-on-write) +Because a branch can be readonly, when you write a file on it, aufs will +"copy-up" it to the upper writable branch internally. And then write the +originally requested thing to the file. Generally kernel doesn't +open/read/write file actively. In aufs, even a single write may cause a +internal "file copy". This behaviour is very similar to cp(1) command. + +Some people may think it is better to pass such work to user space +helper, instead of doing in kernel space. Actually I am still thinking +about it. But currently I have implemented it in kernel space. diff -urN ./Documentation/filesystems/aufs/design/02struct.txt ./Documentation/filesystems/aufs/design/02struct.txt --- ./Documentation/filesystems/aufs/design/02struct.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/02struct.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,205 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +Basic Aufs Internal Structure + +Superblock/Inode/Dentry/File Objects +---------------------------------------------------------------------- +As like an ordinary filesystem, aufs has its own +superblock/inode/dentry/file objects. All these objects have a +dynamically allocated array and store the same kind of pointers to the +lower filesystem, branch. +For example, when you build a union with one readwrite branch and one +readonly, mounted /au, /rw and /ro respectively. +- /au = /rw + /ro +- /ro/fileA exists but /rw/fileA + +Aufs lookup operation finds /ro/fileA and gets dentry for that. These +pointers are stored in a aufs dentry. The array in aufs dentry will be, +- [0] = NULL +- [1] = /ro/fileA + +This style of an array is essentially same to the aufs +superblock/inode/dentry/file objects. + +Because aufs supports manipulating branches, ie. add/delete/change +dynamically, these objects has its own generation. When branches are +changed, the generation in aufs superblock is incremented. And a +generation in other object are compared when it is accessed. +When a generation in other objects are obsoleted, aufs refreshes the +internal array. + + +Superblock +---------------------------------------------------------------------- +Additionally aufs superblock has some data for policies to select one +among multiple writable branches, XIB files, pseudo-links and kobject. +See below in detail. +About the policies which supports copy-down a directory, see policy.txt +too. + + +Branch and XINO(External Inode Number Translation Table) +---------------------------------------------------------------------- +Every branch has its own xino (external inode number translation table) +file. The xino file is created and unlinked by aufs internally. When two +members of a union exist on the same filesystem, they share the single +xino file. +The struct of a xino file is simple, just a sequence of aufs inode +numbers which is indexed by the lower inode number. +In the above sample, assume the inode number of /ro/fileA is i111 and +aufs assigns the inode number i999 for fileA. Then aufs writes 999 as +4(8) bytes at 111 * 4(8) bytes offset in the xino file. + +When the inode numbers are not contiguous, the xino file will be sparse +which has a hole in it and doesn't consume as much disk space as it +might appear. If your branch filesystem consumes disk space for such +holes, then you should specify 'xino=' option at mounting aufs. + +Also a writable branch has three kinds of "whiteout bases". All these +are existed when the branch is joined to aufs and the names are +whiteout-ed doubly, so that users will never see their names in aufs +hierarchy. +1. a regular file which will be linked to all whiteouts. +2. a directory to store a pseudo-link. +3. a directory to store an "orphan-ed" file temporary. + +1. Whiteout Base + When you remove a file on a readonly branch, aufs handles it as a + logical deletion and creates a whiteout on the upper writable branch + as a hardlink of this file in order not to consume inode on the + writable branch. +2. Pseudo-link Dir + See below, Pseudo-link. +3. Step-Parent Dir + When "fileC" exists on the lower readonly branch only and it is + opened and removed with its parent dir, and then user writes + something into it, then aufs copies-up fileC to this + directory. Because there is no other dir to store fileC. After + creating a file under this dir, the file is unlinked. + +Because aufs supports manipulating branches, ie. add/delete/change +dynamically, a branch has its own id. When the branch order changes, aufs +finds the new index by searching the branch id. + + +Pseudo-link +---------------------------------------------------------------------- +Assume "fileA" exists on the lower readonly branch only and it is +hardlinked to "fileB" on the branch. When you write something to fileA, +aufs copies-up it to the upper writable branch. Additionally aufs +creates a hardlink under the Pseudo-link Directory of the writable +branch. The inode of a pseudo-link is kept in aufs super_block as a +simple list. If fileB is read after unlinking fileA, aufs returns +filedata from the pseudo-link instead of the lower readonly +branch. Because the pseudo-link is based upon the inode, to keep the +inode number by xino (see above) is important. + +All the hardlinks under the Pseudo-link Directory of the writable branch +should be restored in a proper location later. Aufs provides a utility +to do this. The userspace helpers executed at remounting and unmounting +aufs by default. + + +XIB(external inode number bitmap) +---------------------------------------------------------------------- +Addition to the xino file per a branch, aufs has an external inode number +bitmap in a superblock object. It is also a file such like a xino file. +It is a simple bitmap to mark whether the aufs inode number is in-use or +not. +To reduce the file I/O, aufs prepares a single memory page to cache xib. + +Aufs implements a feature to truncate/refresh both of xino and xib to +reduce the number of consumed disk blocks for these files. + + +Virtual or Vertical Dir +---------------------------------------------------------------------- +In order to support multiple layers (branches), aufs readdir operation +constructs a virtual dir block on memory. For readdir, aufs calls +vfs_readdir() internally for each dir on branches, merges their entries +with eliminating the whiteout-ed ones, and sets it to file (dir) +object. So the file object has its entry list until it is closed. The +entry list will be updated when the file position is zero and becomes +old. This decision is made in aufs automatically. + +The dynamically allocated memory block for the name of entries has a +unit of 512 bytes (by default) and stores the names contiguously (no +padding). Another block for each entry is handled by kmem_cache too. +During building dir blocks, aufs creates hash list and judging whether +the entry is whiteouted by its upper branch or already listed. + +Some people may call it can be a security hole or invite DoS attack +since the opened and once readdir-ed dir (file object) holds its entry +list and becomes a pressure for system memory. But I'd say it is similar +to files under /proc or /sys. The virtual files in them also holds a +memory page (generally) while they are opened. When an idea to reduce +memory for them is introduced, it will be applied to aufs too. + + +Workqueue +---------------------------------------------------------------------- +Aufs sometimes requires privilege access to a branch. For instance, +in copy-up/down operation. When a user process is going to make changes +to a file which exists in the lower readonly branch only, and the mode +of one of ancestor directories may not be writable by a user +process. Here aufs copy-up the file with its ancestors and they may +require privilege to set its owner/group/mode/etc. +This is a typical case of a application character of aufs (see +Introduction). + +Aufs uses workqueue synchronously for this case. It creates its own +workqueue. The workqueue is a kernel thread and has privilege. Aufs +passes the request to call mkdir or write (for example), and wait for +its completion. This approach solves a problem of a signal handler +simply. +If aufs didn't adopt the workqueue and changed the privilege of the +process, and if the mkdir/write call arises SIGXFSZ or other signal, +then the user process might gain a privilege or the generated core file +was owned by a superuser. But I have a plan to switch to a new +credential approach which will be introduced in linux-2.6.29. + +Also aufs uses the system global workqueue ("events" kernel thread) too +for asynchronous tasks, such like handling inotify, re-creating a +whiteout base and etc. This is unrelated to a privilege. +Most of aufs operation tries acquiring a rw_semaphore for aufs +superblock at the beginning, at the same time waits for the completion +of all queued asynchronous tasks. + + +Whiteout +---------------------------------------------------------------------- +The whiteout in aufs is very similar to Unionfs's. That is represented +by its filename. UnionMount takes an approach of a file mode, but I am +afraid several utilities (find(1) or something) will have to support it. + +Basically the whiteout represents "logical deletion" which stops aufs to +lookup further, but also it represents "dir is opaque" which also stop +lookup. + +In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively. +In order to make several functions in a single systemcall to be +revertible, aufs adopts an approach to rename a directory to a temporary +unique whiteouted name. +For example, in rename(2) dir where the target dir already existed, aufs +renames the target dir to a temporary unique whiteouted name before the +actual rename on a branch and then handles other actions (make it opaque, +update the attributes, etc). If an error happens in these actions, aufs +simply renames the whiteouted name back and returns an error. If all are +succeeded, aufs registers a function to remove the whiteouted unique +temporary name completely and asynchronously to the system global +workqueue. + + +Copy-up +---------------------------------------------------------------------- +It is a well-known feature or concept. +When user modifies a file on a readonly branch, aufs operate "copy-up" +internally and makes change to the new file on the upper writable branch. +When the trigger systemcall does not update the timestamps of the parent +dir, aufs reverts it after copy-up. diff -urN ./Documentation/filesystems/aufs/design/03lookup.txt ./Documentation/filesystems/aufs/design/03lookup.txt --- ./Documentation/filesystems/aufs/design/03lookup.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/03lookup.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,95 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +Lookup in a Branch +---------------------------------------------------------------------- +Since aufs has a character of sub-VFS (see Introduction), it operates +lookup for branches as VFS does. It may be a heavy work. Generally +speaking struct nameidata is a bigger structure and includes many +information. But almost all lookup operation in aufs is the simplest +case, ie. lookup only an entry directly connected to its parent. Digging +down the directory hierarchy is unnecessary. + +VFS has a function lookup_one_len() for that use, but it is not usable +for a branch filesystem which requires struct nameidata. So aufs +implements a simple lookup wrapper function. When a branch filesystem +allows NULL as nameidata, it calls lookup_one_len(). Otherwise it builds +a simplest nameidata and calls lookup_hash(). +Here aufs applies "a principle in NFSD", ie. if the filesystem supports +NFS-export, then it has to support NULL as a nameidata parameter for +->create(), ->lookup() and ->d_revalidate(). So the lookup wrapper in +aufs tests if ->s_export_op in the branch is NULL or not. + +When a branch is a remote filesystem, aufs trusts its ->d_revalidate(). +For d_revalidate, aufs implements three levels of revalidate tests. See +"Revalidate Dentry and UDBA" in detail. + + +Loopback Mount +---------------------------------------------------------------------- +Basically aufs supports any type of filesystem and block device for a +branch (actually there are some exceptions). But it is prohibited to add +a loopback mounted one whose backend file exists in a filesystem which is +already added to aufs. The reason is to protect aufs from a recursive +lookup. If it was allowed, the aufs lookup operation might re-enter a +lookup for the loopback mounted branch in the same context, and will +cause a deadlock. + + +Revalidate Dentry and UDBA (User's Direct Branch Access) +---------------------------------------------------------------------- +Generally VFS helpers re-validate a dentry as a part of lookup. +0. digging down the directory hierarchy. +1. lock the parent dir by its i_mutex. +2. lookup the final (child) entry. +3. revalidate it. +4. call the actual operation (create, unlink, etc.) +5. unlock the parent dir + +If the filesystem implements its ->d_revalidate() (step 3), then it is +called. Actually aufs implements it and checks the dentry on a branch is +still valid. +But it is not enough. Because aufs has to release the lock for the +parent dir on a branch at the end of ->lookup() (step 2) and +->d_revalidate() (step 3) while the i_mutex of the aufs dir is still +held by VFS. +If the file on a branch is changed directly, eg. bypassing aufs, after +aufs released the lock, then the subsequent operation may cause +something unpleasant result. + +This situation is a result of VFS architecture, ->lookup() and +->d_revalidate() is separated. But I never say it is wrong. It is a good +design from VFS's point of view. It is just not suitable for sub-VFS +character in aufs. + +Aufs supports such case by three level of revalidation which is +selectable by user. +1. Simple Revalidate + Addition to the native flow in VFS's, confirm the child-parent + relationship on the branch just after locking the parent dir on the + branch in the "actual operation" (step 4). When this validation + fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still + checks the validation of the dentry on branches. +2. Monitor Changes Internally by Inotify + Addition to above, in the "actual operation" (step 4) aufs re-lookup + the dentry on the branch, and returns EBUSY if it finds different + dentry. + Additionally, aufs sets the inotify watch for every dir on branches + during it is in cache. When the event is notified, aufs registers a + function to kernel 'events' thread by schedule_work(). And the + function sets some special status to the cached aufs dentry and inode + private data. If they are not cached, then aufs has nothing to + do. When the same file is accessed through aufs (step 0-3) later, + aufs will detect the status and refresh all necessary data. + In this mode, aufs has to ignore the event which is fired by aufs + itself. +3. No Extra Validation + This is the simplest test and doesn't add any additional revalidation + test, and skip therevalidatin in step 4. It is useful and improves + aufs performance when system surely hide the aufs branches from user, + by over-mounting something (or another method). diff -urN ./Documentation/filesystems/aufs/design/04branch.txt ./Documentation/filesystems/aufs/design/04branch.txt --- ./Documentation/filesystems/aufs/design/04branch.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/04branch.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,67 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +Branch Manipulation + +Since aufs supports dynamic branch manipulation, ie. add/remove a branch +and changing its permission/attribute, there are a lot of works to do. + + +Add a Branch +---------------------------------------------------------------------- +o Confirm the adding dir exists outside of aufs, including loopback + mount. +- and other various attributes... +o Initialize the xino file and whiteout bases if necessary. + See struct.txt. + +o Check the owner/group/mode of the directory + When the owner/group/mode of the adding directory differs from the + existing branch, aufs issues a warning because it may impose a + security risk. + For example, when a upper writable branch has a world writable empty + top directory, a malicious user can create any files on the writable + branch directly, like copy-up and modify manually. If something like + /etc/{passwd,shadow} exists on the lower readonly branch but the upper + writable branch, and the writable branch is world-writable, then a + malicious guy may create /etc/passwd on the writable branch directly + and the infected file will be valid in aufs. + I am afraid it can be a security issue, but nothing to do except + producing a warning. + + +Delete a Branch +---------------------------------------------------------------------- +o Confirm the deleting branch is not busy + To be general, there is one merit to adopt "remount" interface to + manipulate branches. It is to discard caches. At deleting a branch, + aufs checks the still cached (and connected) dentries and inodes. If + there are any, then they are all in-use. An inode without its + corresponding dentry can be alive alone (for example, inotify case). + + For the cached one, aufs checks whether the same named entry exists on + other branches. + If the cached one is a directory, because aufs provides a merged view + to users, as long as one dir is left on any branch aufs can show the + dir to users. In this case, the branch can be removed from aufs. + Otherwise aufs rejects deleting the branch. + + If any file on the deleting branch is opened by aufs, then aufs + rejects deleting. + + +Modify the Permission of a Branch +---------------------------------------------------------------------- +o Re-initialize or remove the xino file and whiteout bases if necessary. + See struct.txt. + +o rw --> ro: Confirm the modifying branch is not busy + Aufs rejects the request if any of these conditions are true. + - a file on the branch is mmap-ed. + - a regular file on the branch is opened for write and there is no + same named entry on the upper branch. diff -urN ./Documentation/filesystems/aufs/design/05wbr_policy.txt ./Documentation/filesystems/aufs/design/05wbr_policy.txt --- ./Documentation/filesystems/aufs/design/05wbr_policy.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/05wbr_policy.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,57 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + + +Policies to Select One among Multiple Writable Branches +---------------------------------------------------------------------- +When the number of writable branch is more than one, aufs has to decide +the target branch for file creation or copy-up. By default, the highest +writable branch which has the parent (or ancestor) dir of the target +file is chosen (top-down-parent policy). +By user's request, aufs implements some other policies to select the +writable branch, for file creation two policies, round-robin and +most-free-space policies. For copy-up three policies, top-down-parent, +bottom-up-parent and bottom-up policies. + +As expected, the round-robin policy selects the branch in circular. When +you have two writable branches and creates 10 new files, 5 files will be +created for each branch. mkdir(2) systemcall is an exception. When you +create 10 new directories, all will be created on the same branch. +And the most-free-space policy selects the one which has most free +space among the writable branches. The amount of free space will be +checked by aufs internally, and users can specify its time interval. + +The policies for copy-up is more simple, +top-down-parent is equivalent to the same named on in create policy, +bottom-up-parent selects the writable branch where the parent dir +exists and the nearest upper one from the copyup-source, +bottom-up selects the nearest upper writable branch from the +copyup-source, regardless the existence of the parent dir. + +There are some rules or exceptions to apply these policies. +- If there is a readonly branch above the policy-selected branch and + the parent dir is marked as opaque (a variation of whiteout), or the + target (creating) file is whiteout-ed on the upper readonly branch, + then the result of the policy is ignored and the target file will be + created on the nearest upper writable branch than the readonly branch. +- If there is a writable branch above the policy-selected branch and + the parent dir is marked as opaque or the target file is whiteouted + on the branch, then the result of the policy is ignored and the target + file will be created on the highest one among the upper writable + branches who has diropq or whiteout. In case of whiteout, aufs removes + it as usual. +- link(2) and rename(2) systemcalls are exceptions in every policy. + They try selecting the branch where the source exists as possible + since copyup a large file will take long time. If it can't be, + ie. the branch where the source exists is readonly, then they will + follow the copyup policy. +- There is an exception for rename(2) when the target exists. + If the rename target exists, aufs compares the index of the branches + where the source and the target exists and selects the higher + one. If the selected branch is readonly, then aufs follows the + copyup policy. diff -urN ./Documentation/filesystems/aufs/design/06fmode_exec.txt ./Documentation/filesystems/aufs/design/06fmode_exec.txt --- ./Documentation/filesystems/aufs/design/06fmode_exec.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/06fmode_exec.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,24 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +FMODE_EXEC and deny_write() +---------------------------------------------------------------------- +Generally Unix prevents an executing file from writing its filedata. +In linux it is implemented by deny_write() and allow_write(). +When a file is executed by exec() family, open_exec() (and sys_uselib()) +they opens the file and calls deny_write(). If the file is aufs's virtual +one, it has no meaning. The file which deny_write() is really necessary +is the file on a branch. But the FMODE_EXEC flag is not passed to +->open() operation. So aufs adopt a dirty trick. + +- in order to get FMODE_EXEC, aufs ->lookup() and ->d_revalidate() set + nd->intent.open.file->private_data to nd->intent.open.flags temporary. +- in aufs ->open(), when FMODE_EXEC is set in file->private_data, it + calls deny_write() for the file on a branch. +- when the aufs file is released, allow_write() for the file on a branch + is called. diff -urN ./Documentation/filesystems/aufs/design/07mmap.txt ./Documentation/filesystems/aufs/design/07mmap.txt --- ./Documentation/filesystems/aufs/design/07mmap.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/07mmap.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,44 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +mmap(2) -- File Memory Mapping +---------------------------------------------------------------------- +In aufs, the file-mapped pages are shared between the file on a branch +and the virtual one in aufs by overriding vm_operation, particularly +->fault(). + +In aufs_mmap(), +- get and store vm_ops of the real file on a branch. +- map the file of aufs by generic_file_mmap() and set aufs's vm + operations. + +In aufs_fault(), +- get the file of aufs from the passed vma, sleep if needed. +- get the real file on a branch from the aufs file. +- a race may happen. for instance a multithreaded library. so some lock + is implemented. +- call ->fault() in the previously stored vm_ops with setting the + real file on a branch to vm_file. +- restore vm_file and wake_up if someone else got sleep. + +When a branch is added to or deleted from aufs, the same-named file may +unveil and its contents will be replaced by the new one when a process +read(2) through previously opened file. +(Some users may not want to refresh the filedata. For such users, I +have a plan to implement a mount option 'refrof' which decides to +refresh the opened files or not. See plan.txt too.) +In this case, an already mapped file will not be updated since the +contents are a part of a process already and it should not be changed by +aufs branch manipulation. (Even if MAP_SHARED is specified, currently). +Of course, in case of the deleting branch has a busy file, it cannot be +deleted from the union. + +In Unionfs, it took an approach which the memory pages mapped to +filedata are copied from the lower (real) file into the Unionfs's +virtual one and handles it by address_space operations. Recently Unionfs +changed it to this approach which aufs adopted since Jul 2006. diff -urN ./Documentation/filesystems/aufs/design/08export.txt ./Documentation/filesystems/aufs/design/08export.txt --- ./Documentation/filesystems/aufs/design/08export.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/08export.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,51 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + + +Export Aufs via NFS +---------------------------------------------------------------------- +Here is an approach. +- like xino/xib, add a new file 'xigen' which stores aufs inode + generation. +- iget_locked(): initialize aufs inode generation for a new inode, and + store it in xigen file. +- destroy_inode(): increment aufs inode generation and store it in xigen + file. it is necessary even if it is not unlinked, because any data of + inode may be changed by UDBA. +- encode_fh(): for a root dir, simply return FILEID_ROOT. otherwise + build file handle by + + branch id (4 bytes) + + superblock generation (4 bytes) + + inode number (4 or 8 bytes) + + parent dir inode number (4 or 8 bytes) + + inode generation (4 bytes)) + + return value of exportfs_encode_fh() for the parent on a branch (4 + bytes) + + file handle for a branch (by exportfs_encode_fh()) +- fh_to_dentry(): + + find the index of a branch from its id in handle, and check it is + still exist in aufs. + + 1st level: get the inode number from handle and search it in cache. + + 2nd level: if not found, get the parent inode number from handle and + search it in cache. and then open the parent dir, find the matching + inode number by vfs_readdir() and get its name, and call + lookup_one_len() for the target dentry. + + 3rd level: if the parent dir is not cached, call + exportfs_decode_fh() for a branch and get the parent on a branch, + build a pathname of it, convert it a pathname in aufs, call + path_lookup(). now aufs gets a parent dir dentry, then handle it as + the 2nd level. + + to open the dir, aufs needs struct vfsmount. aufs keeps vfsmount + for every branch, but not itself. to get this, (currently) aufs + searches in current->nsproxy->mnt_ns list. it may not be a good + idea, but I didn't get other approach. + + test the generation of the gotten inode. +- every inode operation: they may get EBUSY due to UDBA. in this case, + convert it into ESTALE for NFSD. +- readdir(): call lockdep_on/off() because filldir in NFSD calls + lookup_one_len(), vfs_getattr(), encode_fh() and others. diff -urN ./Documentation/filesystems/aufs/design/99plan.txt ./Documentation/filesystems/aufs/design/99plan.txt --- ./Documentation/filesystems/aufs/design/99plan.txt 1970-01-01 01:00:00.000000000 +0100 +++ ./Documentation/filesystems/aufs/design/99plan.txt 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,125 @@ + +# Copyright (C) 2005-2009 Junjiro R. Okajima +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. + +Plan + +Restoring some features which was implemented in aufs1. +They were dropped in aufs2 in order to make source files simpler and +easier to be reviewed. + + +Test Only the Highest One for the Directory Permission (dirperm1 option) +---------------------------------------------------------------------- +Let's try case study. +- aufs has two branches, upper readwrite and lower readonly. + /au = /rw + /ro +- "dirA" exists under /ro, but /rw. and its mode is 0700. +- user invoked "chmod a+rx /au/dirA" +- then "dirA" becomes world readable? + +In this case, /ro/dirA is still 0700 since it exists in readonly branch, +or it may be a natively readonly filesystem. If aufs respects the lower +branch, it should not respond readdir request from other users. But user +allowed it by chmod. Should really aufs rejects showing the entries +under /ro/dirA? + +To be honest, I don't have a best solution for this case. So I +implemented 'dirperm1' and 'nodirperm1' option in aufs1, and leave it to +users. +When dirperm1 is specified, aufs checks only the highest one for the +directory permission, and shows the entries. Otherwise, as usual, checks +every dir existing on all branches and rejects the request. + +As a side effect, dirperm1 option improves the performance of aufs +because the number of permission check is reduced. + + +Show Whiteout Mode (shwh) +---------------------------------------------------------------------- +Generally aufs hides the name of whiteouts. But in some cases, to show +them is very useful for users. For instance, creating a new middle layer +(branch) by merging existing layers. + +(borrowing aufs1 HOW-TO from a user, Michael Towers) +When you have three branches, +- Bottom: 'system', squashfs (underlying base system), read-only +- Middle: 'mods', squashfs, read-only +- Top: 'overlay', ram (tmpfs), read-write + +The top layer is loaded at boot time and saved at shutdown, to preserve +the changes made to the system during the session. +When larger changes have been made, or smaller changes have accumulated, +the size of the saved top layer data grows. At this point, it would be +nice to be able to merge the two overlay branches ('mods' and 'overlay') +and rewrite the 'mods' squashfs, clearing the top layer and thus +restoring save and load speed. + +This merging is simplified by the use of another aufs mount, of just the +two overlay branches using the 'shwh' option. +# mount -t aufs -o ro,shwh,br:/livesys/overlay=ro+wh:/livesys/mods=rr+wh \ + aufs /livesys/merge_union + +A merged view of these two branches is then available at +/livesys/merge_union, and the new feature is that the whiteouts are +visible! +Note that in 'shwh' mode the aufs mount must be 'ro', which will disable +writing to all branches. Also the default mode for all branches is 'ro'. +It is now possible to save the combined contents of the two overlay +branches to a new squashfs, e.g.: +# mksquashfs /livesys/merge_union /path/to/newmods.squash + +This new squashfs archive can be stored on the boot device and the +initramfs will use it to replace the old one at the next boot. + + +Being Another Aufs's Readonly Branch (robr) +---------------------------------------------------------------------- +Aufs1 allows aufs to be another aufs's readonly branch. +This feature was developed by a user's request. But it may not be used +currecnly. + + +Copy-up on Open (coo=) +---------------------------------------------------------------------- +By default the internal copy-up is executed when it is really necessary. +It is not done when a file is opened for writing, but when write(2) is +done. Users who have many (over 100) branches want to know and analyse +when and what file is copied-up. To insert a new upper branch which +contains such files only may improve the performance of aufs. + +Aufs1 implemented "coo=none | leaf | all" option. + + +Refresh the Opened File (refrof) +---------------------------------------------------------------------- +This option is implemented in aufs1 but incomplete. + +When user reads from a file, he expects to get its latest filedata +generally. If the file is removed and a new same named file is created, +the content he gets is unchanged, ie. the unlinked filedata. + +Let's try case study again. +- aufs has two branches. + /au = /rw + /ro +- "fileA" exists under /ro, but /rw. +- user opened "/au/fileA". +- he or someone else inserts a branch (/new) between /rw and /ro. + /au = /rw + /new + /ro +- the new branch has "fileA". +- user reads from the opened "fileA" +- which filedata should aufs return, from /ro or /new? + +Some people says it has to be "from /ro" and it is a semantics of Unix. +The others say it should be "from /new" because the file is not removed +and it is equivalent to the case of someone else modifies the file. + +Here again I don't have a best and final answer. I got an idea to +implement 'refrof' and 'norefrof' option. When 'refrof' (REFResh the +Opened File) is specified (by default), aufs returns the filedata from +/new. +Otherwise from /new. diff -urN ./fs/aufs/Kconfig ./fs/aufs/Kconfig --- ./fs/aufs/Kconfig 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/Kconfig 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,108 @@ +config AUFS_FS + tristate "Aufs (Advanced multi layered unification filesystem) support" + depends on EXPERIMENTAL + help + Aufs is a stackable unification filesystem such as Unionfs, + which unifies several directories and provides a merged single + directory. + In the early days, aufs was entirely re-designed and + re-implemented Unionfs Version 1.x series. Introducing many + original ideas, approaches and improvements, it becomes totally + different from Unionfs while keeping the basic features. + +if AUFS_FS +choice + prompt "Maximum number of branches" + default AUFS_BRANCH_MAX_127 + help + Specifies the maximum number of branches (or member directories) + in a single aufs. The larger value consumes more system + resources and has a minor impact to performance. +config AUFS_BRANCH_MAX_127 + bool "127" + help + Specifies the maximum number of branches (or member directories) + in a single aufs. The larger value consumes more system + resources and has a minor impact to performance. +config AUFS_BRANCH_MAX_511 + bool "511" + help + Specifies the maximum number of branches (or member directories) + in a single aufs. The larger value consumes more system + resources and has a minor impact to performance. +config AUFS_BRANCH_MAX_1023 + bool "1023" + help + Specifies the maximum number of branches (or member directories) + in a single aufs. The larger value consumes more system + resources and has a minor impact to performance. +config AUFS_BRANCH_MAX_32767 + bool "32767" + help + Specifies the maximum number of branches (or member directories) + in a single aufs. The larger value consumes more system + resources and has a minor impact to performance. +endchoice + +config AUFS_HINOTIFY + bool "Use inotify to detect actions on a branch" + depends on INOTIFY + help + If you want to modify files on branches directly, eg. bypassing aufs, + and want aufs to detect the changes of them fully, then enable this + option and use 'udba=inotify' mount option. + It will have a negative impact to the performance. + See detail in aufs.5. + +config AUFS_EXPORT + bool "NFS-exportable aufs" + depends on (AUFS_FS = y && EXPORTFS = y) || (AUFS_FS = m && EXPORTFS) + help + If you want to export your mounted aufs via NFS, then enable this + option. There are several requirements for this configuration. + See detail in aufs.5. + +config AUFS_BR_RAMFS + bool "Ramfs (initramfs/rootfs) as an aufs branch" + help + If you want to use ramfs as an aufs branch fs, then enable this + option. Generally tmpfs is recommended. + Aufs prohibited them to be a branch fs by default, because + initramfs becomes unusable after switch_root or something + generally. If you sets initramfs as an aufs branch and boot your + system by switch_root, you will meet a problem easily since the + files in initramfs may be inaccessible. + Unless you are going to use ramfs as an aufs branch fs without + switch_root or something, leave it N. + +config AUFS_DEBUG + bool "Debug aufs" + help + Enable this to compile aufs internal debug code. + It will have a negative impact to the performance. + +config AUFS_MAGIC_SYSRQ + bool + depends on AUFS_DEBUG && MAGIC_SYSRQ + default y + help + Automatic configuration for internal use. + When aufs supports Magic SysRq, enabled automatically. + +config AUFS_BDEV_LOOP + bool + depends on BLK_DEV_LOOP + default y + help + Automatic configuration for internal use. + Convert =[ym] into =y. + +config AUFS_INO_T_64 + bool + depends on 64BIT && !(ALPHA || S390) + default y + help + Automatic configuration for internal use. + /* typedef unsigned long/int __kernel_ino_t */ + /* alpha and s390x are int */ +endif diff -urN ./fs/aufs/Makefile ./fs/aufs/Makefile --- ./fs/aufs/Makefile 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/Makefile 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,22 @@ + +include ${src}/magic.mk +-include ${src}/priv_def.mk + +obj-$(CONFIG_AUFS_FS) += aufs.o +aufs-y := module.o sbinfo.o super.o branch.o xino.o sysaufs.o opts.o \ + wkq.o vfsub.o dcsub.o \ + cpup.o whout.o plink.o wbr_policy.o \ + dinfo.o dentry.o \ + finfo.o file.o f_op.o \ + dir.o vdir.o \ + iinfo.o inode.o i_op.o i_op_add.o i_op_del.o i_op_ren.o \ + ioctl.o + +# all are boolean +aufs-$(CONFIG_SYSFS) += sysfs.o +aufs-$(CONFIG_DEBUG_FS) += dbgaufs.o +aufs-$(CONFIG_AUFS_BDEV_LOOP) += loop.o +aufs-$(CONFIG_AUFS_HINOTIFY) += hinotify.o +aufs-$(CONFIG_AUFS_EXPORT) += export.o +aufs-$(CONFIG_AUFS_DEBUG) += debug.o +aufs-$(CONFIG_AUFS_MAGIC_SYSRQ) += sysrq.o diff -urN ./fs/aufs/aufs.h ./fs/aufs/aufs.h --- ./fs/aufs/aufs.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/aufs.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,44 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * all header files + */ + +#ifndef __AUFS_H__ +#define __AUFS_H__ + +#ifdef __KERNEL__ + +#include + +#include "debug.h" + +#include "branch.h" +#include "cpup.h" +#include "dcsub.h" +#include "dbgaufs.h" +#include "dentry.h" +#include "dir.h" +#include "file.h" +#include "fstype.h" +#include "inode.h" +#include "loop.h" +#include "module.h" +#include "opts.h" +#include "rwsem.h" +#include "spl.h" +#include "super.h" +#include "sysaufs.h" +#include "vfsub.h" +#include "whout.h" +#include "wkq.h" + +#endif /* __KERNEL__ */ +#endif /* __AUFS_H__ */ diff -urN ./fs/aufs/branch.c ./fs/aufs/branch.c --- ./fs/aufs/branch.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/branch.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,946 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * branch management + */ + +#include "aufs.h" + +/* + * free a single branch + */ +static void au_br_do_free(struct au_branch *br) +{ + int i; + struct au_wbr *wbr; + + if (br->br_xino.xi_file) + fput(br->br_xino.xi_file); + mutex_destroy(&br->br_xino.xi_nondir_mtx); + + AuDebugOn(atomic_read(&br->br_count)); + + wbr = br->br_wbr; + if (wbr) { + for (i = 0; i < AuBrWh_Last; i++) + dput(wbr->wbr_wh[i]); + AuDebugOn(atomic_read(&wbr->wbr_wh_running)); + au_rwsem_destroy(&wbr->wbr_wh_rwsem); + } + + /* some filesystems acquire extra lock */ + lockdep_off(); + mntput(br->br_mnt); + lockdep_on(); + + kfree(wbr); + kfree(br); +} + +/* + * frees all branches + */ +void au_br_free(struct au_sbinfo *sbinfo) +{ + aufs_bindex_t bmax; + struct au_branch **br; + + bmax = sbinfo->si_bend + 1; + br = sbinfo->si_branch; + while (bmax--) + au_br_do_free(*br++); +} + +/* + * find the index of a branch which is specified by @br_id. + */ +int au_br_index(struct super_block *sb, aufs_bindex_t br_id) +{ + aufs_bindex_t bindex, bend; + + bend = au_sbend(sb); + for (bindex = 0; bindex <= bend; bindex++) + if (au_sbr_id(sb, bindex) == br_id) + return bindex; + return -1; +} + +/* ---------------------------------------------------------------------- */ + +/* + * add a branch + */ + +static int test_overlap(struct super_block *sb, struct dentry *h_d1, + struct dentry *h_d2) +{ + if (unlikely(h_d1 == h_d2)) + return 1; + return !!au_test_subdir(h_d1, h_d2) + || !!au_test_subdir(h_d2, h_d1) + || au_test_loopback_overlap(sb, h_d1, h_d2) + || au_test_loopback_overlap(sb, h_d2, h_d1); +} + +/* + * returns a newly allocated branch. @new_nbranch is a number of branches + * after adding a branch. + */ +static struct au_branch *au_br_alloc(struct super_block *sb, int new_nbranch, + int perm) +{ + struct au_branch *add_branch; + struct dentry *root; + + root = sb->s_root; + add_branch = kmalloc(sizeof(*add_branch), GFP_NOFS); + if (unlikely(!add_branch)) + goto out; + + add_branch->br_wbr = NULL; + if (au_br_writable(perm)) { + /* may be freed separately at changing the branch permission */ + add_branch->br_wbr = kmalloc(sizeof(*add_branch->br_wbr), + GFP_NOFS); + if (unlikely(!add_branch->br_wbr)) + goto out_br; + } + + if (unlikely(au_sbr_realloc(au_sbi(sb), new_nbranch) + || au_di_realloc(au_di(root), new_nbranch) + || au_ii_realloc(au_ii(root->d_inode), new_nbranch))) + goto out_wbr; + return add_branch; /* success */ + + out_wbr: + kfree(add_branch->br_wbr); + out_br: + kfree(add_branch); + out: + return ERR_PTR(-ENOMEM); +} + +/* + * test if the branch permission is legal or not. + */ +static int test_br(struct inode *inode, int brperm, char *path) +{ + int err; + + err = 0; + if (unlikely(au_br_writable(brperm) && IS_RDONLY(inode))) { + AuErr("write permission for readonly mount or inode, %s\n", + path); + err = -EINVAL; + } + + return err; +} + +/* + * returns: + * 0: success, the caller will add it + * plus: success, it is already unified, the caller should ignore it + * minus: error + */ +static int test_add(struct super_block *sb, struct au_opt_add *add, int remount) +{ + int err; + aufs_bindex_t bend, bindex; + struct dentry *root; + struct inode *inode, *h_inode; + + root = sb->s_root; + bend = au_sbend(sb); + if (unlikely(bend >= 0 + && au_find_dbindex(root, add->path.dentry) >= 0)) { + err = 1; + if (!remount) { + err = -EINVAL; + AuErr("%s duplicated\n", add->pathname); + } + goto out; + } + + err = -ENOSPC; /* -E2BIG; */ + if (unlikely(AUFS_BRANCH_MAX <= add->bindex + || AUFS_BRANCH_MAX - 1 <= bend)) { + AuErr("number of branches exceeded %s\n", add->pathname); + goto out; + } + + err = -EDOM; + if (unlikely(add->bindex < 0 || bend + 1 < add->bindex)) { + AuErr("bad index %d\n", add->bindex); + goto out; + } + + inode = add->path.dentry->d_inode; + err = -ENOENT; + if (unlikely(!inode->i_nlink)) { + AuErr("no existence %s\n", add->pathname); + goto out; + } + + err = -EINVAL; + if (unlikely(inode->i_sb == sb)) { + AuErr("%s must be outside\n", add->pathname); + goto out; + } + + if (unlikely(au_test_fs_unsuppoted(inode->i_sb))) { + AuErr("unsupported filesystem, %s (%s)\n", + add->pathname, au_sbtype(inode->i_sb)); + goto out; + } + + err = test_br(add->path.dentry->d_inode, add->perm, add->pathname); + if (unlikely(err)) + goto out; + + if (bend < 0) + return 0; /* success */ + + err = -EINVAL; + for (bindex = 0; bindex <= bend; bindex++) + if (unlikely(test_overlap(sb, add->path.dentry, + au_h_dptr(root, bindex)))) { + AuErr("%s is overlapped\n", add->pathname); + goto out; + } + + err = 0; + if (au_opt_test(au_mntflags(sb), WARN_PERM)) { + h_inode = au_h_dptr(root, 0)->d_inode; + if ((h_inode->i_mode & S_IALLUGO) != (inode->i_mode & S_IALLUGO) + || h_inode->i_uid != inode->i_uid + || h_inode->i_gid != inode->i_gid) + AuWarn("uid/gid/perm %s %u/%u/0%o, %u/%u/0%o\n", + add->pathname, + inode->i_uid, inode->i_gid, + (inode->i_mode & S_IALLUGO), + h_inode->i_uid, h_inode->i_gid, + (h_inode->i_mode & S_IALLUGO)); + } + + out: + return err; +} + +/* + * initialize or clean the whiteouts for an adding branch + */ +static int au_br_init_wh(struct super_block *sb, struct au_branch *br, + int new_perm, struct dentry *h_root) +{ + int err, old_perm; + aufs_bindex_t bindex; + struct mutex *h_mtx; + struct au_wbr *wbr; + struct au_hinode *hdir; + + wbr = br->br_wbr; + old_perm = br->br_perm; + br->br_perm = new_perm; + hdir = NULL; + h_mtx = NULL; + bindex = au_br_index(sb, br->br_id); + if (0 <= bindex) { + hdir = au_hi(sb->s_root->d_inode, bindex); + au_hin_imtx_lock_nested(hdir, AuLsc_I_PARENT); + } else { + h_mtx = &h_root->d_inode->i_mutex; + mutex_lock_nested(h_mtx, AuLsc_I_PARENT); + } + if (!wbr) + err = au_wh_init(h_root, br, sb); + else { + wbr_wh_write_lock(wbr); + err = au_wh_init(h_root, br, sb); + wbr_wh_write_unlock(wbr); + } + if (hdir) + au_hin_imtx_unlock(hdir); + else + mutex_unlock(h_mtx); + br->br_perm = old_perm; + + if (!err && wbr && !au_br_writable(new_perm)) { + kfree(wbr); + br->br_wbr = NULL; + } + + return err; +} + +static int au_wbr_init(struct au_branch *br, struct super_block *sb, + int perm, struct path *path) +{ + int err; + struct au_wbr *wbr; + + wbr = br->br_wbr; + init_rwsem(&wbr->wbr_wh_rwsem); + memset(wbr->wbr_wh, 0, sizeof(wbr->wbr_wh)); + atomic_set(&wbr->wbr_wh_running, 0); + wbr->wbr_bytes = 0; + + err = au_br_init_wh(sb, br, perm, path->dentry); + + return err; +} + +/* intialize a new branch */ +static int au_br_init(struct au_branch *br, struct super_block *sb, + struct au_opt_add *add) +{ + int err; + + err = 0; + memset(&br->br_xino, 0, sizeof(br->br_xino)); + mutex_init(&br->br_xino.xi_nondir_mtx); + br->br_perm = add->perm; + br->br_mnt = add->path.mnt; /* set first, mntget() later */ + atomic_set(&br->br_count, 0); + br->br_xino_upper = AUFS_XINO_TRUNC_INIT; + atomic_set(&br->br_xino_running, 0); + br->br_id = au_new_br_id(sb); + + if (au_br_writable(add->perm)) { + err = au_wbr_init(br, sb, add->perm, &add->path); + if (unlikely(err)) + goto out; + } + + if (au_opt_test(au_mntflags(sb), XINO)) { + err = au_xino_br(sb, br, add->path.dentry->d_inode->i_ino, + au_sbr(sb, 0)->br_xino.xi_file, /*do_test*/1); + if (unlikely(err)) { + AuDebugOn(br->br_xino.xi_file); + goto out; + } + } + + sysaufs_br_init(br); + mntget(add->path.mnt); + + out: + return err; +} + +static void au_br_do_add_brp(struct au_sbinfo *sbinfo, aufs_bindex_t bindex, + struct au_branch *br, aufs_bindex_t bend, + aufs_bindex_t amount) +{ + struct au_branch **brp; + + brp = sbinfo->si_branch + bindex; + memmove(brp + 1, brp, sizeof(*brp) * amount); + *brp = br; + sbinfo->si_bend++; + if (unlikely(bend < 0)) + sbinfo->si_bend = 0; +} + +static void au_br_do_add_hdp(struct au_dinfo *dinfo, aufs_bindex_t bindex, + aufs_bindex_t bend, aufs_bindex_t amount) +{ + struct au_hdentry *hdp; + + hdp = dinfo->di_hdentry + bindex; + memmove(hdp + 1, hdp, sizeof(*hdp) * amount); + au_h_dentry_init(hdp); + dinfo->di_bend++; + if (unlikely(bend < 0)) + dinfo->di_bstart = 0; +} + +static void au_br_do_add_hip(struct au_iinfo *iinfo, aufs_bindex_t bindex, + aufs_bindex_t bend, aufs_bindex_t amount) +{ + struct au_hinode *hip; + + hip = iinfo->ii_hinode + bindex; + memmove(hip + 1, hip, sizeof(*hip) * amount); + hip->hi_inode = NULL; + au_hin_init(hip, NULL); + iinfo->ii_bend++; + if (unlikely(bend < 0)) + iinfo->ii_bstart = 0; +} + +static void au_br_do_add(struct super_block *sb, struct dentry *h_dentry, + struct au_branch *br, aufs_bindex_t bindex) +{ + struct dentry *root; + struct inode *root_inode; + aufs_bindex_t bend, amount; + + root = sb->s_root; + root_inode = root->d_inode; + au_plink_block_maintain(sb); + bend = au_sbend(sb); + amount = bend + 1 - bindex; + au_br_do_add_brp(au_sbi(sb), bindex, br, bend, amount); + au_br_do_add_hdp(au_di(root), bindex, bend, amount); + au_br_do_add_hip(au_ii(root_inode), bindex, bend, amount); + au_set_h_dptr(root, bindex, dget(h_dentry)); + au_set_h_iptr(root_inode, bindex, au_igrab(h_dentry->d_inode), + /*flags*/0); +} + +int au_br_add(struct super_block *sb, struct au_opt_add *add, int remount) +{ + int err; + unsigned long long maxb; + aufs_bindex_t bend, add_bindex; + struct dentry *root, *h_dentry; + struct inode *root_inode; + struct au_branch *add_branch; + + root = sb->s_root; + root_inode = root->d_inode; + IMustLock(root_inode); + err = test_add(sb, add, remount); + if (unlikely(err < 0)) + goto out; + if (err) { + err = 0; + goto out; /* success */ + } + + bend = au_sbend(sb); + add_branch = au_br_alloc(sb, bend + 2, add->perm); + err = PTR_ERR(add_branch); + if (IS_ERR(add_branch)) + goto out; + + err = au_br_init(add_branch, sb, add); + if (unlikely(err)) { + au_br_do_free(add_branch); + goto out; + } + + add_bindex = add->bindex; + h_dentry = add->path.dentry; + if (!remount) + au_br_do_add(sb, h_dentry, add_branch, add_bindex); + else { + sysaufs_brs_del(sb, add_bindex); + au_br_do_add(sb, h_dentry, add_branch, add_bindex); + sysaufs_brs_add(sb, add_bindex); + } + + if (!add_bindex) + au_cpup_attr_all(root_inode, /*force*/1); + else + au_add_nlink(root_inode, h_dentry->d_inode); + maxb = h_dentry->d_sb->s_maxbytes; + if (sb->s_maxbytes < maxb) + sb->s_maxbytes = maxb; + + /* + * this test/set prevents aufs from handling unnecesary inotify events + * of xino files, in a case of re-adding a writable branch which was + * once detached from aufs. + */ + if (au_xino_brid(sb) < 0 + && au_br_writable(add_branch->br_perm) + && !au_test_fs_bad_xino(h_dentry->d_sb) + && add_branch->br_xino.xi_file + && add_branch->br_xino.xi_file->f_dentry->d_parent == h_dentry) + au_xino_brid_set(sb, add_branch->br_id); + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * delete a branch + */ + +/* to show the line number, do not make it inlined function */ +#define AuVerbose(do_info, fmt, args...) do { \ + if (do_info) \ + AuInfo(fmt, ##args); \ +} while (0) + +/* + * test if the branch is deletable or not. + */ +static int test_dentry_busy(struct dentry *root, aufs_bindex_t bindex, + unsigned int sigen) +{ + int err, i, j, ndentry; + aufs_bindex_t bstart, bend; + unsigned char verbose; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + struct dentry *d; + struct inode *inode; + + err = au_dpages_init(&dpages, GFP_NOFS); + if (unlikely(err)) + goto out; + err = au_dcsub_pages(&dpages, root, NULL, NULL); + if (unlikely(err)) + goto out_dpages; + + verbose = !!au_opt_test(au_mntflags(root->d_sb), VERBOSE); + for (i = 0; !err && i < dpages.ndpage; i++) { + dpage = dpages.dpages + i; + ndentry = dpage->ndentry; + for (j = 0; !err && j < ndentry; j++) { + d = dpage->dentries[j]; + AuDebugOn(!atomic_read(&d->d_count)); + inode = d->d_inode; + if (au_digen(d) == sigen && au_iigen(inode) == sigen) + di_read_lock_child(d, AuLock_IR); + else { + di_write_lock_child(d); + err = au_reval_dpath(d, sigen); + if (!err) + di_downgrade_lock(d, AuLock_IR); + else { + di_write_unlock(d); + break; + } + } + + bstart = au_dbstart(d); + bend = au_dbend(d); + if (bstart <= bindex + && bindex <= bend + && au_h_dptr(d, bindex) + && (!S_ISDIR(inode->i_mode) || bstart == bend)) { + err = -EBUSY; + AuVerbose(verbose, "busy %.*s\n", AuDLNPair(d)); + } + di_read_unlock(d, AuLock_IR); + } + } + + out_dpages: + au_dpages_free(&dpages); + out: + return err; +} + +static int test_inode_busy(struct super_block *sb, aufs_bindex_t bindex, + unsigned int sigen) +{ + int err; + struct inode *i; + aufs_bindex_t bstart, bend; + unsigned char verbose; + + err = 0; + verbose = !!au_opt_test(au_mntflags(sb), VERBOSE); + list_for_each_entry(i, &sb->s_inodes, i_sb_list) { + AuDebugOn(!atomic_read(&i->i_count)); + if (!list_empty(&i->i_dentry)) + continue; + + if (au_iigen(i) == sigen) + ii_read_lock_child(i); + else { + ii_write_lock_child(i); + err = au_refresh_hinode_self(i, /*do_attr*/1); + if (!err) + ii_downgrade_lock(i); + else { + ii_write_unlock(i); + break; + } + } + + bstart = au_ibstart(i); + bend = au_ibend(i); + if (bstart <= bindex + && bindex <= bend + && au_h_iptr(i, bindex) + && (!S_ISDIR(i->i_mode) || bstart == bend)) { + err = -EBUSY; + AuVerbose(verbose, "busy i%lu\n", i->i_ino); + ii_read_unlock(i); + break; + } + ii_read_unlock(i); + } + + return err; +} + +static int test_children_busy(struct dentry *root, aufs_bindex_t bindex) +{ + int err; + unsigned int sigen; + + sigen = au_sigen(root->d_sb); + DiMustNoWaiters(root); + IiMustNoWaiters(root->d_inode); + di_write_unlock(root); + err = test_dentry_busy(root, bindex, sigen); + if (!err) + err = test_inode_busy(root->d_sb, bindex, sigen); + di_write_lock_child(root); /* aufs_write_lock() calls ..._child() */ + + return err; +} + +static void au_br_do_del_brp(struct au_sbinfo *sbinfo, + const aufs_bindex_t bindex, + const aufs_bindex_t bend) +{ + struct au_branch **brp, **p; + + brp = sbinfo->si_branch + bindex; + if (bindex < bend) + memmove(brp, brp + 1, sizeof(*brp) * (bend - bindex)); + sbinfo->si_branch[0 + bend] = NULL; + sbinfo->si_bend--; + + p = krealloc(sbinfo->si_branch, sizeof(*p) * bend, GFP_NOFS); + if (p) + sbinfo->si_branch = p; +} + +static void au_br_do_del_hdp(struct au_dinfo *dinfo, const aufs_bindex_t bindex, + const aufs_bindex_t bend) +{ + struct au_hdentry *hdp, *p; + + hdp = dinfo->di_hdentry + bindex; + if (bindex < bend) + memmove(hdp, hdp + 1, sizeof(*hdp) * (bend - bindex)); + dinfo->di_hdentry[0 + bend].hd_dentry = NULL; + dinfo->di_bend--; + + p = krealloc(dinfo->di_hdentry, sizeof(*p) * bend, GFP_NOFS); + if (p) + dinfo->di_hdentry = p; +} + +static void au_br_do_del_hip(struct au_iinfo *iinfo, const aufs_bindex_t bindex, + const aufs_bindex_t bend) +{ + struct au_hinode *hip, *p; + + hip = iinfo->ii_hinode + bindex; + if (bindex < bend) + memmove(hip, hip + 1, sizeof(*hip) * (bend - bindex)); + iinfo->ii_hinode[0 + bend].hi_inode = NULL; + au_hin_init(iinfo->ii_hinode + bend, NULL); + iinfo->ii_bend--; + + p = krealloc(iinfo->ii_hinode, sizeof(*p) * bend, GFP_NOFS); + if (p) + iinfo->ii_hinode = p; +} + +static void au_br_do_del(struct super_block *sb, aufs_bindex_t bindex, + struct au_branch *br) +{ + aufs_bindex_t bend; + struct au_sbinfo *sbinfo; + struct dentry *root; + struct inode *inode; + + root = sb->s_root; + inode = root->d_inode; + au_plink_block_maintain(sb); + sbinfo = au_sbi(sb); + bend = sbinfo->si_bend; + + dput(au_h_dptr(root, bindex)); + au_hiput(au_hi(inode, bindex)); + au_br_do_free(br); + + au_br_do_del_brp(sbinfo, bindex, bend); + au_br_do_del_hdp(au_di(root), bindex, bend); + au_br_do_del_hip(au_ii(inode), bindex, bend); +} + +int au_br_del(struct super_block *sb, struct au_opt_del *del, int remount) +{ + int err, rerr, i; + unsigned int mnt_flags; + aufs_bindex_t bindex, bend, br_id; + unsigned char do_wh, verbose; + struct au_branch *br; + struct au_wbr *wbr; + + err = 0; + bindex = au_find_dbindex(sb->s_root, del->h_path.dentry); + if (bindex < 0) { + if (remount) + goto out; /* success */ + err = -ENOENT; + AuErr("%s no such branch\n", del->pathname); + goto out; + } + AuDbg("bindex b%d\n", bindex); + + err = -EBUSY; + mnt_flags = au_mntflags(sb); + verbose = !!au_opt_test(mnt_flags, VERBOSE); + bend = au_sbend(sb); + if (unlikely(!bend)) { + AuVerbose(verbose, "no more branches left\n"); + goto out; + } + br = au_sbr(sb, bindex); + i = atomic_read(&br->br_count); + if (unlikely(i)) { + AuVerbose(verbose, "%d file(s) opened\n", i); + goto out; + } + + wbr = br->br_wbr; + do_wh = wbr && (wbr->wbr_whbase || wbr->wbr_plink || wbr->wbr_orph); + if (do_wh) { + for (i = 0; i < AuBrWh_Last; i++) { + dput(wbr->wbr_wh[i]); + wbr->wbr_wh[i] = NULL; + } + } + + err = test_children_busy(sb->s_root, bindex); + if (unlikely(err)) { + if (do_wh) + goto out_wh; + goto out; + } + + err = 0; + br_id = br->br_id; + if (!remount) + au_br_do_del(sb, bindex, br); + else { + sysaufs_brs_del(sb, bindex); + au_br_do_del(sb, bindex, br); + sysaufs_brs_add(sb, bindex); + } + + if (!bindex) + au_cpup_attr_all(sb->s_root->d_inode, /*force*/1); + else + au_sub_nlink(sb->s_root->d_inode, del->h_path.dentry->d_inode); + if (au_opt_test(mnt_flags, PLINK)) + au_plink_half_refresh(sb, br_id); + + if (sb->s_maxbytes == del->h_path.dentry->d_sb->s_maxbytes) { + bend--; + sb->s_maxbytes = 0; + for (bindex = 0; bindex <= bend; bindex++) { + unsigned long long maxb; + + maxb = au_sbr_sb(sb, bindex)->s_maxbytes; + if (sb->s_maxbytes < maxb) + sb->s_maxbytes = maxb; + } + } + + if (au_xino_brid(sb) == br->br_id) + au_xino_brid_set(sb, -1); + goto out; /* success */ + + out_wh: + /* revert */ + rerr = au_br_init_wh(sb, br, br->br_perm, del->h_path.dentry); + if (rerr) + AuWarn("failed re-creating base whiteout, %s. (%d)\n", + del->pathname, rerr); + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * change a branch permission + */ + +static int do_need_sigen_inc(int a, int b) +{ + return au_br_whable(a) && !au_br_whable(b); +} + +static int need_sigen_inc(int old, int new) +{ + return do_need_sigen_inc(old, new) + || do_need_sigen_inc(new, old); +} + +static int au_br_mod_files_ro(struct super_block *sb, aufs_bindex_t bindex) +{ + int err; + unsigned long n, ul, bytes, files; + aufs_bindex_t bstart; + struct file *file, *hf, **a; + const int step_bytes = 1024, /* memory allocation unit */ + step_files = step_bytes / sizeof(*a); + + err = -ENOMEM; + n = 0; + bytes = step_bytes; + files = step_files; + a = kmalloc(bytes, GFP_NOFS); + if (unlikely(!a)) + goto out; + + /* no need file_list_lock() since sbinfo is locked? defered? */ + list_for_each_entry(file, &sb->s_files, f_u.fu_list) { + if (special_file(file->f_dentry->d_inode->i_mode)) + continue; + + AuDbg("%.*s\n", AuDLNPair(file->f_dentry)); + fi_read_lock(file); + if (unlikely(au_test_mmapped(file))) { + err = -EBUSY; + FiMustNoWaiters(file); + fi_read_unlock(file); + goto out_free; + } + + bstart = au_fbstart(file); + if (!S_ISREG(file->f_dentry->d_inode->i_mode) + || !(file->f_mode & FMODE_WRITE) + || bstart != bindex) { + FiMustNoWaiters(file); + fi_read_unlock(file); + continue; + } + + hf = au_h_fptr(file, bstart); + FiMustNoWaiters(file); + fi_read_unlock(file); + + if (n < files) + a[n++] = hf; + else { + void *p; + + err = -ENOMEM; + bytes += step_bytes; + files += step_files; + p = krealloc(a, bytes, GFP_NOFS); + if (p) { + a = p; + a[n++] = hf; + } else + goto out_free; + } + } + + err = 0; + for (ul = 0; ul < n; ul++) { + /* todo: already flushed? */ + /* cf. fs/super.c:mark_files_ro() */ + hf = a[ul]; + hf->f_mode &= ~FMODE_WRITE; + if (!file_check_writeable(hf)) { + file_release_write(hf); + mnt_drop_write(hf->f_vfsmnt); + } + } + + out_free: + kfree(a); + out: + return err; +} + +int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount, + int *do_update) +{ + int err, rerr; + aufs_bindex_t bindex; + struct dentry *root; + struct au_branch *br; + + root = sb->s_root; + au_plink_block_maintain(sb); + bindex = au_find_dbindex(root, mod->h_root); + if (bindex < 0) { + if (remount) + return 0; /* success */ + err = -ENOENT; + AuErr("%s no such branch\n", mod->path); + goto out; + } + AuDbg("bindex b%d\n", bindex); + + err = test_br(mod->h_root->d_inode, mod->perm, mod->path); + if (unlikely(err)) + goto out; + + br = au_sbr(sb, bindex); + if (br->br_perm == mod->perm) + return 0; /* success */ + + if (au_br_writable(br->br_perm)) { + /* remove whiteout base */ + err = au_br_init_wh(sb, br, mod->perm, mod->h_root); + if (unlikely(err)) + goto out; + + if (!au_br_writable(mod->perm)) { + /* rw --> ro, file might be mmapped */ + DiMustNoWaiters(root); + IiMustNoWaiters(root->d_inode); + di_write_unlock(root); + err = au_br_mod_files_ro(sb, bindex); + /* aufs_write_lock() calls ..._child() */ + di_write_lock_child(root); + + if (unlikely(err)) { + rerr = -ENOMEM; + br->br_wbr = kmalloc(sizeof(*br->br_wbr), + GFP_NOFS); + if (br->br_wbr) + rerr = au_br_init_wh + (sb, br, br->br_perm, + mod->h_root); + if (unlikely(rerr)) { + AuIOErr("nested error %d (%d)\n", + rerr, err); + br->br_perm = mod->perm; + } + } + } + } else if (au_br_writable(mod->perm)) { + /* ro --> rw */ + err = -ENOMEM; + br->br_wbr = kmalloc(sizeof(*br->br_wbr), GFP_NOFS); + if (br->br_wbr) { + struct path path = { + .mnt = br->br_mnt, + .dentry = mod->h_root + }; + + err = au_wbr_init(br, sb, mod->perm, &path); + if (unlikely(err)) { + kfree(br->br_wbr); + br->br_wbr = NULL; + } + } + } + + if (!err) { + *do_update |= need_sigen_inc(br->br_perm, mod->perm); + br->br_perm = mod->perm; + } + + out: + return err; +} diff -urN ./fs/aufs/branch.h ./fs/aufs/branch.h --- ./fs/aufs/branch.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/branch.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,207 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * branch filesystems and xino for them + */ + +#ifndef __AUFS_BRANCH_H__ +#define __AUFS_BRANCH_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include "rwsem.h" +#include "super.h" + +/* ---------------------------------------------------------------------- */ + +/* a xino file */ +struct au_xino_file { + struct file *xi_file; + struct mutex xi_nondir_mtx; + + /* todo: make xino files an array to support huge inode number */ + +#ifdef CONFIG_DEBUG_FS + struct dentry *xi_dbgaufs; +#endif +}; + +/* members for writable branch only */ +enum {AuBrWh_BASE, AuBrWh_PLINK, AuBrWh_ORPH, AuBrWh_Last}; +struct au_wbr { + struct rw_semaphore wbr_wh_rwsem; + struct dentry *wbr_wh[AuBrWh_Last]; + atomic_t wbr_wh_running; +#define wbr_whbase wbr_wh[AuBrWh_BASE] /* whiteout base */ +#define wbr_plink wbr_wh[AuBrWh_PLINK] /* pseudo-link dir */ +#define wbr_orph wbr_wh[AuBrWh_ORPH] /* dir for orphans */ + + /* mfs mode */ + unsigned long long wbr_bytes; +}; + +/* protected by superblock rwsem */ +struct au_branch { + struct au_xino_file br_xino; + + aufs_bindex_t br_id; + + int br_perm; + struct vfsmount *br_mnt; + atomic_t br_count; + + struct au_wbr *br_wbr; + + /* xino truncation */ + blkcnt_t br_xino_upper; /* watermark in blocks */ + atomic_t br_xino_running; + +#ifdef CONFIG_SYSFS + /* an entry under sysfs per mount-point */ + char br_name[8]; + struct attribute br_attr; +#endif +}; + +/* ---------------------------------------------------------------------- */ + +/* branch permission and attribute */ +enum { + AuBrPerm_RW, /* writable, linkable wh */ + AuBrPerm_RO, /* readonly, no wh */ + AuBrPerm_RR, /* natively readonly, no wh */ + + AuBrPerm_RWNoLinkWH, /* un-linkable whiteouts */ + + AuBrPerm_ROWH, /* whiteout-able */ + AuBrPerm_RRWH, /* whiteout-able */ + + AuBrPerm_Last +}; + +static inline int au_br_writable(int brperm) +{ + return brperm == AuBrPerm_RW || brperm == AuBrPerm_RWNoLinkWH; +} + +static inline int au_br_whable(int brperm) +{ + return brperm == AuBrPerm_RW + || brperm == AuBrPerm_ROWH + || brperm == AuBrPerm_RRWH; +} + +static inline int au_br_rdonly(struct au_branch *br) +{ + return ((br->br_mnt->mnt_sb->s_flags & MS_RDONLY) + || !au_br_writable(br->br_perm)) + ? -EROFS : 0; +} + +static inline int au_br_hinotifyable(int brperm __maybe_unused) +{ +#ifdef CONFIG_AUFS_HINOTIFY + return brperm != AuBrPerm_RR && brperm != AuBrPerm_RRWH; +#else + return 0; +#endif +} + +/* ---------------------------------------------------------------------- */ + +/* branch.c */ +struct au_sbinfo; +void au_br_free(struct au_sbinfo *sinfo); +int au_br_index(struct super_block *sb, aufs_bindex_t br_id); +struct au_opt_add; +int au_br_add(struct super_block *sb, struct au_opt_add *add, int remount); +struct au_opt_del; +int au_br_del(struct super_block *sb, struct au_opt_del *del, int remount); +struct au_opt_mod; +int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount, + int *do_update); + +/* xino.c */ +static const loff_t au_loff_max = LLONG_MAX; + +int au_xib_trunc(struct super_block *sb); +ssize_t xino_fread(au_readf_t func, struct file *file, void *buf, size_t size, + loff_t *pos); +ssize_t xino_fwrite(au_writef_t func, struct file *file, void *buf, size_t size, + loff_t *pos); +struct file *au_xino_create2(struct file *base_file, struct file *copy_src); +struct file *au_xino_create(struct super_block *sb, char *fname, int silent); +ino_t au_xino_new_ino(struct super_block *sb); +int au_xino_write0(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t ino); +int au_xino_write(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t ino); +int au_xino_read(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t *ino); +int au_xino_br(struct super_block *sb, struct au_branch *br, ino_t hino, + struct file *base_file, int do_test); +int au_xino_trunc(struct super_block *sb, aufs_bindex_t bindex); + +struct au_opt_xino; +int au_xino_set(struct super_block *sb, struct au_opt_xino *xino, int remount); +void au_xino_clr(struct super_block *sb); +struct file *au_xino_def(struct super_block *sb); +int au_xino_path(struct seq_file *seq, struct file *file); + +/* ---------------------------------------------------------------------- */ + +/* Superblock to branch */ +static inline +aufs_bindex_t au_sbr_id(struct super_block *sb, aufs_bindex_t bindex) +{ + return au_sbr(sb, bindex)->br_id; +} + +static inline +struct vfsmount *au_sbr_mnt(struct super_block *sb, aufs_bindex_t bindex) +{ + return au_sbr(sb, bindex)->br_mnt; +} + +static inline +struct super_block *au_sbr_sb(struct super_block *sb, aufs_bindex_t bindex) +{ + return au_sbr_mnt(sb, bindex)->mnt_sb; +} + +static inline void au_sbr_put(struct super_block *sb, aufs_bindex_t bindex) +{ + atomic_dec(&au_sbr(sb, bindex)->br_count); +} + +static inline int au_sbr_perm(struct super_block *sb, aufs_bindex_t bindex) +{ + return au_sbr(sb, bindex)->br_perm; +} + +static inline int au_sbr_whable(struct super_block *sb, aufs_bindex_t bindex) +{ + return au_br_whable(au_sbr_perm(sb, bindex)); +} + +/* ---------------------------------------------------------------------- */ + +/* + * wbr_wh_read_lock, wbr_wh_write_lock + * wbr_wh_read_unlock, wbr_wh_write_unlock, wbr_wh_downgrade_lock + */ +AuSimpleRwsemFuncs(wbr_wh, struct au_wbr *wbr, &wbr->wbr_wh_rwsem); + +#endif /* __KERNEL__ */ +#endif /* __AUFS_BRANCH_H__ */ diff -urN ./fs/aufs/cpup.c ./fs/aufs/cpup.c --- ./fs/aufs/cpup.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/cpup.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,1023 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * copy-up functions, see wbr_policy.c for copy-down + */ + +#include +#include +#include "aufs.h" + +void au_cpup_attr_flags(struct inode *dst, struct inode *src) +{ + const unsigned int mask = S_DEAD | S_SWAPFILE | S_PRIVATE + | S_NOATIME | S_NOCMTIME; + + dst->i_flags |= src->i_flags & ~mask; + if (au_test_fs_notime(dst->i_sb)) + dst->i_flags |= S_NOATIME | S_NOCMTIME; +} + +void au_cpup_attr_timesizes(struct inode *inode) +{ + struct inode *h_inode; + + h_inode = au_h_iptr(inode, au_ibstart(inode)); + fsstack_copy_attr_times(inode, h_inode); + vfsub_copy_inode_size(inode, h_inode); +} + +void au_cpup_attr_nlink(struct inode *inode, int force) +{ + struct inode *h_inode; + struct super_block *sb; + aufs_bindex_t bindex, bend; + + sb = inode->i_sb; + bindex = au_ibstart(inode); + h_inode = au_h_iptr(inode, bindex); + if (!force + && !S_ISDIR(h_inode->i_mode) + && au_opt_test(au_mntflags(sb), PLINK) + && au_plink_test(inode)) + return; + + inode->i_nlink = h_inode->i_nlink; + + /* + * fewer nlink makes find(1) noisy, but larger nlink doesn't. + * it may includes whplink directory. + */ + if (S_ISDIR(h_inode->i_mode)) { + bend = au_ibend(inode); + for (bindex++; bindex <= bend; bindex++) { + h_inode = au_h_iptr(inode, bindex); + if (h_inode) + au_add_nlink(inode, h_inode); + } + } +} + +void au_cpup_attr_changeable(struct inode *inode) +{ + struct inode *h_inode; + + h_inode = au_h_iptr(inode, au_ibstart(inode)); + inode->i_mode = h_inode->i_mode; + inode->i_uid = h_inode->i_uid; + inode->i_gid = h_inode->i_gid; + au_cpup_attr_timesizes(inode); + au_cpup_attr_flags(inode, h_inode); +} + +void au_cpup_igen(struct inode *inode, struct inode *h_inode) +{ + struct au_iinfo *iinfo = au_ii(inode); + + iinfo->ii_higen = h_inode->i_generation; + iinfo->ii_hsb1 = h_inode->i_sb; +} + +void au_cpup_attr_all(struct inode *inode, int force) +{ + struct inode *h_inode; + + h_inode = au_h_iptr(inode, au_ibstart(inode)); + au_cpup_attr_changeable(inode); + if (inode->i_nlink > 0) + au_cpup_attr_nlink(inode, force); + inode->i_rdev = h_inode->i_rdev; + inode->i_blkbits = h_inode->i_blkbits; + au_cpup_igen(inode, h_inode); +} + +/* ---------------------------------------------------------------------- */ + +/* Note: dt_dentry and dt_h_dentry are not dget/dput-ed */ + +/* keep the timestamps of the parent dir when cpup */ +void au_dtime_store(struct au_dtime *dt, struct dentry *dentry, + struct path *h_path) +{ + struct inode *h_inode; + + dt->dt_dentry = dentry; + dt->dt_h_path = *h_path; + h_inode = h_path->dentry->d_inode; + dt->dt_atime = h_inode->i_atime; + dt->dt_mtime = h_inode->i_mtime; + /* smp_mb(); */ +} + +void au_dtime_revert(struct au_dtime *dt) +{ + struct iattr attr; + int err; + + attr.ia_atime = dt->dt_atime; + attr.ia_mtime = dt->dt_mtime; + attr.ia_valid = ATTR_FORCE | ATTR_MTIME | ATTR_MTIME_SET + | ATTR_ATIME | ATTR_ATIME_SET; + + err = vfsub_notify_change(&dt->dt_h_path, &attr); + if (unlikely(err)) + AuWarn("restoring timestamps failed(%d). ignored\n", err); +} + +/* ---------------------------------------------------------------------- */ + +static noinline_for_stack +int cpup_iattr(struct dentry *dst, aufs_bindex_t bindex, struct dentry *h_src) +{ + int err, sbits; + struct iattr ia; + struct path h_path; + struct inode *h_isrc; + + h_path.dentry = au_h_dptr(dst, bindex); + h_path.mnt = au_sbr_mnt(dst->d_sb, bindex); + h_isrc = h_src->d_inode; + ia.ia_valid = ATTR_FORCE | ATTR_MODE | ATTR_UID | ATTR_GID + | ATTR_ATIME | ATTR_MTIME + | ATTR_ATIME_SET | ATTR_MTIME_SET; + ia.ia_mode = h_isrc->i_mode; + ia.ia_uid = h_isrc->i_uid; + ia.ia_gid = h_isrc->i_gid; + ia.ia_atime = h_isrc->i_atime; + ia.ia_mtime = h_isrc->i_mtime; + sbits = !!(ia.ia_mode & (S_ISUID | S_ISGID)); + au_cpup_attr_flags(h_path.dentry->d_inode, h_isrc); + err = vfsub_notify_change(&h_path, &ia); + + /* is this nfs only? */ + if (!err && sbits && au_test_nfs(h_path.dentry->d_sb)) { + ia.ia_valid = ATTR_FORCE | ATTR_MODE; + ia.ia_mode = h_isrc->i_mode; + err = vfsub_notify_change(&h_path, &ia); + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int au_do_copy_file(struct file *dst, struct file *src, loff_t len, + char *buf, unsigned long blksize) +{ + int err; + size_t sz, rbytes, wbytes; + unsigned char all_zero; + char *p, *zp; + struct mutex *h_mtx; + /* reduce stack usage */ + struct iattr *ia; + + zp = page_address(ZERO_PAGE(0)); + if (unlikely(!zp)) + return -ENOMEM; /* possible? */ + + err = 0; + all_zero = 0; + while (len) { + AuDbg("len %lld\n", len); + sz = blksize; + if (len < blksize) + sz = len; + + rbytes = 0; + /* todo: signal_pending? */ + while (!rbytes || err == -EAGAIN || err == -EINTR) { + rbytes = vfsub_read_k(src, buf, sz, &src->f_pos); + err = rbytes; + } + if (unlikely(err < 0)) + break; + + all_zero = 0; + if (len >= rbytes && rbytes == blksize) + all_zero = !memcmp(buf, zp, rbytes); + if (!all_zero) { + wbytes = rbytes; + p = buf; + while (wbytes) { + size_t b; + + b = vfsub_write_k(dst, p, wbytes, &dst->f_pos); + err = b; + /* todo: signal_pending? */ + if (unlikely(err == -EAGAIN || err == -EINTR)) + continue; + if (unlikely(err < 0)) + break; + wbytes -= b; + p += b; + } + } else { + loff_t res; + + AuLabel(hole); + res = vfsub_llseek(dst, rbytes, SEEK_CUR); + err = res; + if (unlikely(res < 0)) + break; + } + len -= rbytes; + err = 0; + } + + /* the last block may be a hole */ + if (!err && all_zero) { + AuLabel(last hole); + + err = 1; + if (au_test_nfs(dst->f_dentry->d_sb)) { + /* nfs requires this step to make last hole */ + /* is this only nfs? */ + do { + /* todo: signal_pending? */ + err = vfsub_write_k(dst, "\0", 1, &dst->f_pos); + } while (err == -EAGAIN || err == -EINTR); + if (err == 1) + dst->f_pos--; + } + + if (err == 1) { + ia = (void *)buf; + ia->ia_size = dst->f_pos; + ia->ia_valid = ATTR_SIZE | ATTR_FILE; + ia->ia_file = dst; + h_mtx = &dst->f_dentry->d_inode->i_mutex; + mutex_lock_nested(h_mtx, AuLsc_I_CHILD2); + err = vfsub_notify_change(&dst->f_path, ia); + mutex_unlock(h_mtx); + } + } + + return err; +} + +int au_copy_file(struct file *dst, struct file *src, loff_t len) +{ + int err; + unsigned long blksize; + unsigned char do_kfree; + char *buf; + + err = -ENOMEM; + blksize = dst->f_dentry->d_sb->s_blocksize; + if (!blksize || PAGE_SIZE < blksize) + blksize = PAGE_SIZE; + AuDbg("blksize %lu\n", blksize); + do_kfree = (blksize != PAGE_SIZE && blksize >= sizeof(struct iattr *)); + if (do_kfree) + buf = kmalloc(blksize, GFP_NOFS); + else + buf = (void *)__get_free_page(GFP_NOFS); + if (unlikely(!buf)) + goto out; + + if (len > (1 << 22)) + AuDbg("copying a large file %lld\n", (long long)len); + + src->f_pos = 0; + dst->f_pos = 0; + err = au_do_copy_file(dst, src, len, buf, blksize); + if (do_kfree) + kfree(buf); + else + free_page((unsigned long)buf); + + out: + return err; +} + +/* + * to support a sparse file which is opened with O_APPEND, + * we need to close the file. + */ +static int au_cp_regular(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len) +{ + int err, i; + enum { SRC, DST }; + struct { + aufs_bindex_t bindex; + unsigned int flags; + struct dentry *dentry; + struct file *file; + void *label; + } *f, file[] = { + { + .bindex = bsrc, + .flags = O_RDONLY | O_NOATIME | O_LARGEFILE, + .file = NULL, + .label = &&out, + }, + { + .bindex = bdst, + .flags = O_WRONLY | O_NOATIME | O_LARGEFILE, + .file = NULL, + .label = &&out_src, + } + }; + struct super_block *sb; + + /* bsrc branch can be ro/rw. */ + sb = dentry->d_sb; + f = file; + for (i = 0; i < 2; i++, f++) { + f->dentry = au_h_dptr(dentry, f->bindex); + f->file = au_h_open(dentry, f->bindex, f->flags, /*file*/NULL); + err = PTR_ERR(f->file); + if (IS_ERR(f->file)) + goto *f->label; + } + + /* try stopping to update while we copyup */ + IMustLock(file[SRC].dentry->d_inode); + err = au_copy_file(file[DST].file, file[SRC].file, len); + + fput(file[DST].file); + au_sbr_put(sb, file[DST].bindex); + + out_src: + fput(file[SRC].file); + au_sbr_put(sb, file[SRC].bindex); + out: + return err; +} + +static int au_do_cpup_regular(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len, + struct inode *h_dir, struct path *h_path) +{ + int err, rerr; + loff_t l; + + err = 0; + l = i_size_read(au_h_iptr(dentry->d_inode, bsrc)); + if (len == -1 || l < len) + len = l; + if (len) + err = au_cp_regular(dentry, bdst, bsrc, len); + if (!err) + goto out; /* success */ + + rerr = vfsub_unlink(h_dir, h_path, /*force*/0); + if (rerr) { + AuIOErr("failed unlinking cpup-ed %.*s(%d, %d)\n", + AuDLNPair(h_path->dentry), err, rerr); + err = -EIO; + } + + out: + return err; +} + +static int au_do_cpup_symlink(struct path *h_path, struct dentry *h_src, + struct inode *h_dir) +{ + int err, symlen; + mm_segment_t old_fs; + char *sym; + + err = -ENOSYS; + if (unlikely(!h_src->d_inode->i_op->readlink)) + goto out; + + err = -ENOMEM; + sym = __getname(); + if (unlikely(!sym)) + goto out; + + old_fs = get_fs(); + set_fs(KERNEL_DS); + symlen = h_src->d_inode->i_op->readlink(h_src, (char __user *)sym, + PATH_MAX); + err = symlen; + set_fs(old_fs); + + if (symlen > 0) { + sym[symlen] = 0; + err = vfsub_symlink(h_dir, h_path, sym); + } + __putname(sym); + + out: + return err; +} + +/* return with the lower dst inode is locked */ +static noinline_for_stack +int cpup_entry(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len, unsigned int flags, + struct dentry *dst_parent) +{ + int err; + umode_t mode; + unsigned int mnt_flags; + unsigned char isdir; + const unsigned char do_dt = !!au_ftest_cpup(flags, DTIME); + struct au_dtime dt; + struct path h_path; + struct dentry *h_src, *h_dst, *h_parent; + struct inode *h_inode, *h_dir; + struct super_block *sb; + + /* bsrc branch can be ro/rw. */ + h_src = au_h_dptr(dentry, bsrc); + h_inode = h_src->d_inode; + AuDebugOn(h_inode != au_h_iptr(dentry->d_inode, bsrc)); + + /* try stopping to be referenced while we are creating */ + h_dst = au_h_dptr(dentry, bdst); + h_parent = h_dst->d_parent; /* dir inode is locked */ + h_dir = h_parent->d_inode; + IMustLock(h_dir); + AuDebugOn(h_parent != h_dst->d_parent); + + sb = dentry->d_sb; + h_path.mnt = au_sbr_mnt(sb, bdst); + if (do_dt) { + h_path.dentry = h_parent; + au_dtime_store(&dt, dst_parent, &h_path); + } + h_path.dentry = h_dst; + + isdir = 0; + mode = h_inode->i_mode; + switch (mode & S_IFMT) { + case S_IFREG: + /* try stopping to update while we are referencing */ + IMustLock(h_inode); + err = vfsub_create(h_dir, &h_path, mode | S_IWUSR); + if (!err) + err = au_do_cpup_regular + (dentry, bdst, bsrc, len, + au_h_iptr(dst_parent->d_inode, bdst), &h_path); + break; + case S_IFDIR: + isdir = 1; + err = vfsub_mkdir(h_dir, &h_path, mode); + if (!err) { + /* + * strange behaviour from the users view, + * particularry setattr case + */ + if (au_ibstart(dst_parent->d_inode) == bdst) + au_cpup_attr_nlink(dst_parent->d_inode, + /*force*/1); + au_cpup_attr_nlink(dentry->d_inode, /*force*/1); + } + break; + case S_IFLNK: + err = au_do_cpup_symlink(&h_path, h_src, h_dir); + break; + case S_IFCHR: + case S_IFBLK: + AuDebugOn(!capable(CAP_MKNOD)); + /*FALLTHROUGH*/ + case S_IFIFO: + case S_IFSOCK: + err = vfsub_mknod(h_dir, &h_path, mode, h_inode->i_rdev); + break; + default: + AuIOErr("Unknown inode type 0%o\n", mode); + err = -EIO; + } + + mnt_flags = au_mntflags(sb); + if (!au_opt_test(mnt_flags, UDBA_NONE) + && !isdir + && au_opt_test(mnt_flags, XINO) + && h_inode->i_nlink == 1 + /* todo: unnecessary? */ + /* && dentry->d_inode->i_nlink == 1 */ + && bdst < bsrc + && !au_ftest_cpup(flags, KEEPLINO)) + au_xino_write0(sb, bsrc, h_inode->i_ino, /*ino*/0); + /* ignore this error */ + + if (do_dt) + au_dtime_revert(&dt); + return err; +} + +/* + * copyup the @dentry from @bsrc to @bdst. + * the caller must set the both of lower dentries. + * @len is for truncating when it is -1 copyup the entire file. + * in link/rename cases, @dst_parent may be different from the real one. + */ +static int au_cpup_single(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len, unsigned int flags, + struct dentry *dst_parent) +{ + int err, rerr; + aufs_bindex_t old_ibstart; + unsigned char isdir, plink; + struct au_dtime dt; + struct path h_path; + struct dentry *h_src, *h_dst, *h_parent; + struct inode *dst_inode, *h_dir, *inode; + struct super_block *sb; + + AuDebugOn(bsrc <= bdst); + + sb = dentry->d_sb; + h_path.mnt = au_sbr_mnt(sb, bdst); + h_dst = au_h_dptr(dentry, bdst); + h_parent = h_dst->d_parent; /* dir inode is locked */ + h_dir = h_parent->d_inode; + IMustLock(h_dir); + + h_src = au_h_dptr(dentry, bsrc); + inode = dentry->d_inode; + + if (!dst_parent) + dst_parent = dget_parent(dentry); + else + dget(dst_parent); + + plink = !!au_opt_test(au_mntflags(sb), PLINK); + dst_inode = au_h_iptr(inode, bdst); + if (dst_inode) { + if (unlikely(!plink)) { + err = -EIO; + AuIOErr("i%lu exists on a upper branch " + "but plink is disabled\n", inode->i_ino); + goto out; + } + + if (dst_inode->i_nlink) { + const int do_dt = au_ftest_cpup(flags, DTIME); + + h_src = au_plink_lkup(inode, bdst); + err = PTR_ERR(h_src); + if (IS_ERR(h_src)) + goto out; + if (unlikely(!h_src->d_inode)) { + err = -EIO; + AuIOErr("i%lu exists on a upper branch " + "but plink is broken\n", inode->i_ino); + dput(h_src); + goto out; + } + + if (do_dt) { + h_path.dentry = h_parent; + au_dtime_store(&dt, dst_parent, &h_path); + } + h_path.dentry = h_dst; + err = vfsub_link(h_src, h_dir, &h_path); + if (do_dt) + au_dtime_revert(&dt); + dput(h_src); + goto out; + } else + /* todo: cpup_wh_file? */ + /* udba work */ + au_update_brange(inode, 1); + } + + old_ibstart = au_ibstart(inode); + err = cpup_entry(dentry, bdst, bsrc, len, flags, dst_parent); + if (unlikely(err)) + goto out; + dst_inode = h_dst->d_inode; + mutex_lock_nested(&dst_inode->i_mutex, AuLsc_I_CHILD2); + + err = cpup_iattr(dentry, bdst, h_src); + isdir = S_ISDIR(dst_inode->i_mode); + if (!err) { + if (bdst < old_ibstart) + au_set_ibstart(inode, bdst); + au_set_h_iptr(inode, bdst, au_igrab(dst_inode), + au_hi_flags(inode, isdir)); + mutex_unlock(&dst_inode->i_mutex); + if (!isdir + && h_src->d_inode->i_nlink > 1 + && plink) + au_plink_append(inode, bdst, h_dst); + goto out; /* success */ + } + + /* revert */ + h_path.dentry = h_parent; + mutex_unlock(&dst_inode->i_mutex); + au_dtime_store(&dt, dst_parent, &h_path); + h_path.dentry = h_dst; + if (!isdir) + rerr = vfsub_unlink(h_dir, &h_path, /*force*/0); + else + rerr = vfsub_rmdir(h_dir, &h_path); + au_dtime_revert(&dt); + if (rerr) { + AuIOErr("failed removing broken entry(%d, %d)\n", err, rerr); + err = -EIO; + } + + out: + dput(dst_parent); + return err; +} + +struct au_cpup_single_args { + int *errp; + struct dentry *dentry; + aufs_bindex_t bdst, bsrc; + loff_t len; + unsigned int flags; + struct dentry *dst_parent; +}; + +static void au_call_cpup_single(void *args) +{ + struct au_cpup_single_args *a = args; + *a->errp = au_cpup_single(a->dentry, a->bdst, a->bsrc, a->len, + a->flags, a->dst_parent); +} + +int au_sio_cpup_single(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len, unsigned int flags, + struct dentry *dst_parent) +{ + int err, wkq_err; + umode_t mode; + struct dentry *h_dentry; + + h_dentry = au_h_dptr(dentry, bsrc); + mode = h_dentry->d_inode->i_mode & S_IFMT; + if ((mode != S_IFCHR && mode != S_IFBLK) + || capable(CAP_MKNOD)) + err = au_cpup_single(dentry, bdst, bsrc, len, flags, + dst_parent); + else { + struct au_cpup_single_args args = { + .errp = &err, + .dentry = dentry, + .bdst = bdst, + .bsrc = bsrc, + .len = len, + .flags = flags, + .dst_parent = dst_parent + }; + wkq_err = au_wkq_wait(au_call_cpup_single, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + return err; +} + +/* + * copyup the @dentry from the first active lower branch to @bdst, + * using au_cpup_single(). + */ +static int au_cpup_simple(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + unsigned int flags) +{ + int err; + aufs_bindex_t bsrc, bend; + + bend = au_dbend(dentry); + for (bsrc = bdst + 1; bsrc <= bend; bsrc++) + if (au_h_dptr(dentry, bsrc)) + break; + + err = au_lkup_neg(dentry, bdst); + if (!err) { + err = au_cpup_single(dentry, bdst, bsrc, len, flags, NULL); + if (!err) + return 0; /* success */ + + /* revert */ + au_set_h_dptr(dentry, bdst, NULL); + au_set_dbstart(dentry, bsrc); + } + + return err; +} + +struct au_cpup_simple_args { + int *errp; + struct dentry *dentry; + aufs_bindex_t bdst; + loff_t len; + unsigned int flags; +}; + +static void au_call_cpup_simple(void *args) +{ + struct au_cpup_simple_args *a = args; + *a->errp = au_cpup_simple(a->dentry, a->bdst, a->len, a->flags); +} + +int au_sio_cpup_simple(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + unsigned int flags) +{ + int err, wkq_err; + unsigned char do_sio; + struct dentry *parent; + struct inode *h_dir; + + parent = dget_parent(dentry); + h_dir = au_h_iptr(parent->d_inode, bdst); + do_sio = !!au_test_h_perm_sio(h_dir, MAY_EXEC | MAY_WRITE); + if (!do_sio) { + /* + * testing CAP_MKNOD is for generic fs, + * but CAP_FSETID is for xfs only, currently. + */ + umode_t mode = dentry->d_inode->i_mode; + do_sio = (((mode & (S_IFCHR | S_IFBLK)) + && !capable(CAP_MKNOD)) + || ((mode & (S_ISUID | S_ISGID)) + && !capable(CAP_FSETID))); + } + if (!do_sio) + err = au_cpup_simple(dentry, bdst, len, flags); + else { + struct au_cpup_simple_args args = { + .errp = &err, + .dentry = dentry, + .bdst = bdst, + .len = len, + .flags = flags + }; + wkq_err = au_wkq_wait(au_call_cpup_simple, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + dput(parent); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * copyup the deleted file for writing. + */ +static int au_do_cpup_wh(struct dentry *dentry, aufs_bindex_t bdst, + struct dentry *wh_dentry, struct file *file, + loff_t len) +{ + int err; + aufs_bindex_t bstart; + struct au_dinfo *dinfo; + struct dentry *h_d_dst, *h_d_start; + + dinfo = au_di(dentry); + bstart = dinfo->di_bstart; + h_d_dst = dinfo->di_hdentry[0 + bdst].hd_dentry; + dinfo->di_bstart = bdst; + dinfo->di_hdentry[0 + bdst].hd_dentry = wh_dentry; + h_d_start = dinfo->di_hdentry[0 + bstart].hd_dentry; + if (file) + dinfo->di_hdentry[0 + bstart].hd_dentry + = au_h_fptr(file, au_fbstart(file))->f_dentry; + err = au_cpup_single(dentry, bdst, bstart, len, !AuCpup_DTIME, + /*h_parent*/NULL); + if (!err && file) { + err = au_reopen_nondir(file); + dinfo->di_hdentry[0 + bstart].hd_dentry = h_d_start; + } + dinfo->di_hdentry[0 + bdst].hd_dentry = h_d_dst; + dinfo->di_bstart = bstart; + + return err; +} + +static int au_cpup_wh(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + struct file *file) +{ + int err; + struct au_dtime dt; + struct dentry *parent, *h_parent, *wh_dentry; + struct au_branch *br; + struct path h_path; + + br = au_sbr(dentry->d_sb, bdst); + parent = dget_parent(dentry); + h_parent = au_h_dptr(parent, bdst); + wh_dentry = au_whtmp_lkup(h_parent, br, &dentry->d_name); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out; + + h_path.dentry = h_parent; + h_path.mnt = br->br_mnt; + au_dtime_store(&dt, parent, &h_path); + err = au_do_cpup_wh(dentry, bdst, wh_dentry, file, len); + if (unlikely(err)) + goto out_wh; + + dget(wh_dentry); + h_path.dentry = wh_dentry; + err = vfsub_unlink(h_parent->d_inode, &h_path, /*force*/0); + if (unlikely(err)) { + AuIOErr("failed remove copied-up tmp file %.*s(%d)\n", + AuDLNPair(wh_dentry), err); + err = -EIO; + } + au_dtime_revert(&dt); + au_set_hi_wh(dentry->d_inode, bdst, wh_dentry); + + out_wh: + dput(wh_dentry); + out: + dput(parent); + return err; +} + +struct au_cpup_wh_args { + int *errp; + struct dentry *dentry; + aufs_bindex_t bdst; + loff_t len; + struct file *file; +}; + +static void au_call_cpup_wh(void *args) +{ + struct au_cpup_wh_args *a = args; + *a->errp = au_cpup_wh(a->dentry, a->bdst, a->len, a->file); +} + +int au_sio_cpup_wh(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + struct file *file) +{ + int err, wkq_err; + struct dentry *parent, *h_orph, *h_parent, *h_dentry; + struct inode *dir, *h_dir, *h_tmpdir, *h_inode; + struct au_wbr *wbr; + + parent = dget_parent(dentry); + dir = parent->d_inode; + h_orph = NULL; + h_parent = NULL; + h_dir = au_igrab(au_h_iptr(dir, bdst)); + h_tmpdir = h_dir; + if (!h_dir->i_nlink) { + wbr = au_sbr(dentry->d_sb, bdst)->br_wbr; + h_orph = wbr->wbr_orph; + + h_parent = dget(au_h_dptr(parent, bdst)); + au_set_h_dptr(parent, bdst, NULL); + au_set_h_dptr(parent, bdst, dget(h_orph)); + h_tmpdir = h_orph->d_inode; + au_set_h_iptr(dir, bdst, NULL, 0); + au_set_h_iptr(dir, bdst, au_igrab(h_tmpdir), /*flags*/0); + + /* this temporary unlock is safe */ + if (file) + h_dentry = au_h_fptr(file, au_fbstart(file))->f_dentry; + else + h_dentry = au_h_dptr(dentry, au_dbstart(dentry)); + h_inode = h_dentry->d_inode; + IMustLock(h_inode); + mutex_unlock(&h_inode->i_mutex); + mutex_lock_nested(&h_tmpdir->i_mutex, AuLsc_I_PARENT2); + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + } + + if (!au_test_h_perm_sio(h_tmpdir, MAY_EXEC | MAY_WRITE)) + err = au_cpup_wh(dentry, bdst, len, file); + else { + struct au_cpup_wh_args args = { + .errp = &err, + .dentry = dentry, + .bdst = bdst, + .len = len, + .file = file + }; + wkq_err = au_wkq_wait(au_call_cpup_wh, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + if (h_orph) { + mutex_unlock(&h_tmpdir->i_mutex); + au_set_h_iptr(dir, bdst, NULL, 0); + au_set_h_iptr(dir, bdst, au_igrab(h_dir), /*flags*/0); + au_set_h_dptr(parent, bdst, NULL); + au_set_h_dptr(parent, bdst, h_parent); + } + iput(h_dir); + dput(parent); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * generic routine for both of copy-up and copy-down. + */ +/* cf. revalidate function in file.c */ +int au_cp_dirs(struct dentry *dentry, aufs_bindex_t bdst, + int (*cp)(struct dentry *dentry, aufs_bindex_t bdst, + struct dentry *h_parent, void *arg), + void *arg) +{ + int err; + struct au_pin pin; + struct dentry *d, *parent, *h_parent, *real_parent; + + err = 0; + parent = dget_parent(dentry); + if (IS_ROOT(parent)) + goto out; + + au_pin_init(&pin, dentry, bdst, AuLsc_DI_PARENT2, AuLsc_I_PARENT2, + au_opt_udba(dentry->d_sb), AuPin_MNT_WRITE); + + /* do not use au_dpage */ + real_parent = parent; + while (1) { + dput(parent); + parent = dget_parent(dentry); + h_parent = au_h_dptr(parent, bdst); + if (h_parent) + goto out; /* success */ + + /* find top dir which is necessary to cpup */ + do { + d = parent; + dput(parent); + parent = dget_parent(d); + di_read_lock_parent3(parent, !AuLock_IR); + h_parent = au_h_dptr(parent, bdst); + di_read_unlock(parent, !AuLock_IR); + } while (!h_parent); + + if (d != real_parent) + di_write_lock_child3(d); + + /* somebody else might create while we were sleeping */ + if (!au_h_dptr(d, bdst) || !au_h_dptr(d, bdst)->d_inode) { + if (au_h_dptr(d, bdst)) + au_update_dbstart(d); + + au_pin_set_dentry(&pin, d); + err = au_do_pin(&pin); + if (!err) { + err = cp(d, bdst, h_parent, arg); + au_unpin(&pin); + } + } + + if (d != real_parent) + di_write_unlock(d); + if (unlikely(err)) + break; + } + + out: + dput(parent); + return err; +} + +static int au_cpup_dir(struct dentry *dentry, aufs_bindex_t bdst, + struct dentry *h_parent __maybe_unused , + void *arg __maybe_unused) +{ + return au_sio_cpup_simple(dentry, bdst, -1, AuCpup_DTIME); +} + +int au_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst) +{ + return au_cp_dirs(dentry, bdst, au_cpup_dir, NULL); +} + +int au_test_and_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst) +{ + int err; + struct dentry *parent; + struct inode *dir; + + parent = dget_parent(dentry); + dir = parent->d_inode; + err = 0; + if (au_h_iptr(dir, bdst)) + goto out; + + di_read_unlock(parent, AuLock_IR); + di_write_lock_parent(parent); + /* someone else might change our inode while we were sleeping */ + if (!au_h_iptr(dir, bdst)) + err = au_cpup_dirs(dentry, bdst); + di_downgrade_lock(parent, AuLock_IR); + + out: + dput(parent); + return err; +} diff -urN ./fs/aufs/cpup.h ./fs/aufs/cpup.h --- ./fs/aufs/cpup.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/cpup.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,68 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * copy-up/down functions + */ + +#ifndef __AUFS_CPUP_H__ +#define __AUFS_CPUP_H__ + +#ifdef __KERNEL__ + +#include +#include + +void au_cpup_attr_flags(struct inode *dst, struct inode *src); +void au_cpup_attr_timesizes(struct inode *inode); +void au_cpup_attr_nlink(struct inode *inode, int force); +void au_cpup_attr_changeable(struct inode *inode); +void au_cpup_igen(struct inode *inode, struct inode *h_inode); +void au_cpup_attr_all(struct inode *inode, int force); + +/* ---------------------------------------------------------------------- */ + +/* cpup flags */ +#define AuCpup_DTIME 1 /* do dtime_store/revert */ +#define AuCpup_KEEPLINO (1 << 1) /* do not clear the lower xino, + for link(2) */ +#define au_ftest_cpup(flags, name) ((flags) & AuCpup_##name) +#define au_fset_cpup(flags, name) { (flags) |= AuCpup_##name; } +#define au_fclr_cpup(flags, name) { (flags) &= ~AuCpup_##name; } + +int au_copy_file(struct file *dst, struct file *src, loff_t len); +int au_sio_cpup_single(struct dentry *dentry, aufs_bindex_t bdst, + aufs_bindex_t bsrc, loff_t len, unsigned int flags, + struct dentry *dst_parent); +int au_sio_cpup_simple(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + unsigned int flags); +int au_sio_cpup_wh(struct dentry *dentry, aufs_bindex_t bdst, loff_t len, + struct file *file); + +int au_cp_dirs(struct dentry *dentry, aufs_bindex_t bdst, + int (*cp)(struct dentry *dentry, aufs_bindex_t bdst, + struct dentry *h_parent, void *arg), + void *arg); +int au_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst); +int au_test_and_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst); + +/* ---------------------------------------------------------------------- */ + +/* keep timestamps when copyup */ +struct au_dtime { + struct dentry *dt_dentry; + struct path dt_h_path; + struct timespec dt_atime, dt_mtime; +}; +void au_dtime_store(struct au_dtime *dt, struct dentry *dentry, + struct path *h_path); +void au_dtime_revert(struct au_dtime *dt); + +#endif /* __KERNEL__ */ +#endif /* __AUFS_CPUP_H__ */ diff -urN ./fs/aufs/dbgaufs.c ./fs/aufs/dbgaufs.c --- ./fs/aufs/dbgaufs.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dbgaufs.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,304 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * debugfs interface + */ + +#include +#include "aufs.h" + +#ifndef CONFIG_SYSFS +#error DEBUG_FS depends upon SYSFS +#endif + +static struct dentry *dbgaufs; +static const mode_t dbgaufs_mode = S_IRUSR | S_IRGRP | S_IROTH; + +/* 20 is max digits length of ulong 64 */ +struct dbgaufs_arg { + int n; + char a[20 * 4]; +}; + +/* + * common function for all XINO files + */ +static int dbgaufs_xi_release(struct inode *inode __maybe_unused, + struct file *file) +{ + kfree(file->private_data); + return 0; +} + +static int dbgaufs_xi_open(struct file *xf, struct file *file, int do_fcnt) +{ + int err; + struct kstat st; + struct dbgaufs_arg *p; + + err = -ENOMEM; + p = kmalloc(sizeof(*p), GFP_NOFS); + if (unlikely(!p)) + goto out; + + err = 0; + p->n = 0; + file->private_data = p; + if (!xf) + goto out; + + err = vfs_getattr(xf->f_vfsmnt, xf->f_dentry, &st); + if (!err) { + if (do_fcnt) + p->n = snprintf + (p->a, sizeof(p->a), "%ld, %llux%lu %lld\n", + (long)file_count(xf), st.blocks, st.blksize, + (long long)st.size); + else + p->n = snprintf(p->a, sizeof(p->a), "%llux%lu %lld\n", + st.blocks, st.blksize, + (long long)st.size); + AuDebugOn(p->n >= sizeof(p->a)); + } else { + p->n = snprintf(p->a, sizeof(p->a), "err %d\n", err); + err = 0; + } + + out: + return err; + +} + +static ssize_t dbgaufs_xi_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct dbgaufs_arg *p; + + p = file->private_data; + return simple_read_from_buffer(buf, count, ppos, p->a, p->n); +} + +/* ---------------------------------------------------------------------- */ + +static int dbgaufs_xib_open(struct inode *inode, struct file *file) +{ + int err; + struct au_sbinfo *sbinfo; + struct super_block *sb; + + sbinfo = inode->i_private; + sb = sbinfo->si_sb; + si_noflush_read_lock(sb); + err = dbgaufs_xi_open(sbinfo->si_xib, file, /*do_fcnt*/0); + si_read_unlock(sb); + return err; +} + +static const struct file_operations dbgaufs_xib_fop = { + .open = dbgaufs_xib_open, + .release = dbgaufs_xi_release, + .read = dbgaufs_xi_read +}; + +/* ---------------------------------------------------------------------- */ + +#define DbgaufsXi_PREFIX "xi" + +static int dbgaufs_xino_open(struct inode *inode, struct file *file) +{ + int err; + long l; + struct au_sbinfo *sbinfo; + struct super_block *sb; + struct file *xf; + struct qstr *name; + + err = -ENOENT; + xf = NULL; + name = &file->f_dentry->d_name; + if (unlikely(name->len < sizeof(DbgaufsXi_PREFIX) + || memcmp(name->name, DbgaufsXi_PREFIX, + sizeof(DbgaufsXi_PREFIX) - 1))) + goto out; + err = strict_strtol(name->name + sizeof(DbgaufsXi_PREFIX) - 1, 10, &l); + if (unlikely(err)) + goto out; + + sbinfo = inode->i_private; + sb = sbinfo->si_sb; + si_noflush_read_lock(sb); + if (l <= au_sbend(sb)) { + xf = au_sbr(sb, (aufs_bindex_t)l)->br_xino.xi_file; + err = dbgaufs_xi_open(xf, file, /*do_fcnt*/1); + } else + err = -ENOENT; + si_read_unlock(sb); + + out: + return err; +} + +static const struct file_operations dbgaufs_xino_fop = { + .open = dbgaufs_xino_open, + .release = dbgaufs_xi_release, + .read = dbgaufs_xi_read +}; + +void dbgaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex) +{ + aufs_bindex_t bend; + struct au_branch *br; + struct au_xino_file *xi; + + if (!au_sbi(sb)->si_dbgaufs) + return; + + bend = au_sbend(sb); + for (; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + xi = &br->br_xino; + if (xi->xi_dbgaufs) { + debugfs_remove(xi->xi_dbgaufs); + xi->xi_dbgaufs = NULL; + } + } +} + +void dbgaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex) +{ + struct au_sbinfo *sbinfo; + struct dentry *parent; + struct au_branch *br; + struct au_xino_file *xi; + aufs_bindex_t bend; + char name[sizeof(DbgaufsXi_PREFIX) + 5]; /* "xi" bindex NULL */ + + sbinfo = au_sbi(sb); + parent = sbinfo->si_dbgaufs; + if (!parent) + return; + + bend = au_sbend(sb); + for (; bindex <= bend; bindex++) { + snprintf(name, sizeof(name), DbgaufsXi_PREFIX "%d", bindex); + br = au_sbr(sb, bindex); + xi = &br->br_xino; + AuDebugOn(xi->xi_dbgaufs); + xi->xi_dbgaufs = debugfs_create_file(name, dbgaufs_mode, parent, + sbinfo, &dbgaufs_xino_fop); + /* ignore an error */ + if (unlikely(!xi->xi_dbgaufs)) + AuWarn1("failed %s under debugfs\n", name); + } +} + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_EXPORT +static int dbgaufs_xigen_open(struct inode *inode, struct file *file) +{ + int err; + struct au_sbinfo *sbinfo; + struct super_block *sb; + + sbinfo = inode->i_private; + sb = sbinfo->si_sb; + si_noflush_read_lock(sb); + err = dbgaufs_xi_open(sbinfo->si_xigen, file, /*do_fcnt*/0); + si_read_unlock(sb); + return err; +} + +static const struct file_operations dbgaufs_xigen_fop = { + .open = dbgaufs_xigen_open, + .release = dbgaufs_xi_release, + .read = dbgaufs_xi_read +}; + +static int dbgaufs_xigen_init(struct au_sbinfo *sbinfo) +{ + int err; + + err = -EIO; + sbinfo->si_dbgaufs_xigen = debugfs_create_file + ("xigen", dbgaufs_mode, sbinfo->si_dbgaufs, sbinfo, + &dbgaufs_xigen_fop); + if (sbinfo->si_dbgaufs_xigen) + err = 0; + + return err; +} +#else +static int dbgaufs_xigen_init(struct au_sbinfo *sbinfo) +{ + return 0; +} +#endif /* CONFIG_AUFS_EXPORT */ + +/* ---------------------------------------------------------------------- */ + +void dbgaufs_si_fin(struct au_sbinfo *sbinfo) +{ + debugfs_remove_recursive(sbinfo->si_dbgaufs); + sbinfo->si_dbgaufs = NULL; + kobject_put(&sbinfo->si_kobj); +} + +int dbgaufs_si_init(struct au_sbinfo *sbinfo) +{ + int err; + char name[SysaufsSiNameLen]; + + err = -ENOENT; + if (!dbgaufs) { + AuErr1("/debug/aufs is uninitialized\n"); + goto out; + } + + err = -EIO; + sysaufs_name(sbinfo, name); + sbinfo->si_dbgaufs = debugfs_create_dir(name, dbgaufs); + if (unlikely(!sbinfo->si_dbgaufs)) + goto out; + kobject_get(&sbinfo->si_kobj); + + sbinfo->si_dbgaufs_xib = debugfs_create_file + ("xib", dbgaufs_mode, sbinfo->si_dbgaufs, sbinfo, + &dbgaufs_xib_fop); + if (unlikely(!sbinfo->si_dbgaufs_xib)) + goto out_dir; + + err = dbgaufs_xigen_init(sbinfo); + if (!err) + goto out; /* success */ + + out_dir: + dbgaufs_si_fin(sbinfo); + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +void dbgaufs_fin(void) +{ + debugfs_remove(dbgaufs); +} + +int __init dbgaufs_init(void) +{ + int err; + + err = -EIO; + dbgaufs = debugfs_create_dir(AUFS_NAME, NULL); + if (dbgaufs) + err = 0; + return err; +} diff -urN ./fs/aufs/dbgaufs.h ./fs/aufs/dbgaufs.h --- ./fs/aufs/dbgaufs.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dbgaufs.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,68 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * debugfs interface + */ + +#ifndef __DBGAUFS_H__ +#define __DBGAUFS_H__ + +#ifdef __KERNEL__ + +#include +#include + +struct au_sbinfo; +#ifdef CONFIG_DEBUG_FS +/* dbgaufs.c */ +void dbgaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex); +void dbgaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex); +void dbgaufs_si_fin(struct au_sbinfo *sbinfo); +int dbgaufs_si_init(struct au_sbinfo *sbinfo); +void dbgaufs_fin(void); +int __init dbgaufs_init(void); + +#else + +static inline +void dbgaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex) +{ + /* empty */ +} + +static inline +void dbgaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex) +{ + /* empty */ +} + +static inline +void dbgaufs_si_fin(struct au_sbinfo *sbinfo) +{ + /* empty */ +} + +static inline +int dbgaufs_si_init(struct au_sbinfo *sbinfo) +{ + return 0; +} + +#define dbgaufs_fin() do {} while (0) + +static inline +int __init dbgaufs_init(void) +{ + return 0; +} +#endif /* CONFIG_DEBUG_FS */ + +#endif /* __KERNEL__ */ +#endif /* __DBGAUFS_H__ */ diff -urN ./fs/aufs/dcsub.c ./fs/aufs/dcsub.c --- ./fs/aufs/dcsub.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dcsub.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,214 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sub-routines for dentry cache + */ + +#include "aufs.h" + +static void au_dpage_free(struct au_dpage *dpage) +{ + int i; + struct dentry **p; + + p = dpage->dentries; + for (i = 0; i < dpage->ndentry; i++) + dput(*p++); + free_page((unsigned long)dpage->dentries); +} + +int au_dpages_init(struct au_dcsub_pages *dpages, gfp_t gfp) +{ + int err; + void *p; + + err = -ENOMEM; + dpages->dpages = kmalloc(sizeof(*dpages->dpages), gfp); + if (unlikely(!dpages->dpages)) + goto out; + + p = (void *)__get_free_page(gfp); + if (unlikely(!p)) + goto out_dpages; + + dpages->dpages[0].ndentry = 0; + dpages->dpages[0].dentries = p; + dpages->ndpage = 1; + return 0; /* success */ + + out_dpages: + kfree(dpages->dpages); + out: + return err; +} + +void au_dpages_free(struct au_dcsub_pages *dpages) +{ + int i; + struct au_dpage *p; + + p = dpages->dpages; + for (i = 0; i < dpages->ndpage; i++) + au_dpage_free(p++); + kfree(dpages->dpages); +} + +static int au_dpages_append(struct au_dcsub_pages *dpages, + struct dentry *dentry, gfp_t gfp) +{ + int err, sz; + struct au_dpage *dpage; + void *p; + + dpage = dpages->dpages + dpages->ndpage - 1; + sz = PAGE_SIZE / sizeof(dentry); + if (unlikely(dpage->ndentry >= sz)) { + AuLabel(new dpage); + err = -ENOMEM; + sz = dpages->ndpage * sizeof(*dpages->dpages); + p = au_kzrealloc(dpages->dpages, sz, + sz + sizeof(*dpages->dpages), gfp); + if (unlikely(!p)) + goto out; + + dpages->dpages = p; + dpage = dpages->dpages + dpages->ndpage; + p = (void *)__get_free_page(gfp); + if (unlikely(!p)) + goto out; + + dpage->ndentry = 0; + dpage->dentries = p; + dpages->ndpage++; + } + + dpage->dentries[dpage->ndentry++] = dget(dentry); + return 0; /* success */ + + out: + return err; +} + +int au_dcsub_pages(struct au_dcsub_pages *dpages, struct dentry *root, + au_dpages_test test, void *arg) +{ + int err; + struct dentry *this_parent = root; + struct list_head *next; + struct super_block *sb = root->d_sb; + + err = 0; + spin_lock(&dcache_lock); + repeat: + next = this_parent->d_subdirs.next; + resume: + if (this_parent->d_sb == sb + && !IS_ROOT(this_parent) + && atomic_read(&this_parent->d_count) + && this_parent->d_inode + && (!test || test(this_parent, arg))) { + err = au_dpages_append(dpages, this_parent, GFP_ATOMIC); + if (unlikely(err)) + goto out; + } + + while (next != &this_parent->d_subdirs) { + struct list_head *tmp = next; + struct dentry *dentry = list_entry(tmp, struct dentry, + d_u.d_child); + next = tmp->next; + if (/*d_unhashed(dentry) || */!dentry->d_inode) + continue; + if (!list_empty(&dentry->d_subdirs)) { + this_parent = dentry; + goto repeat; + } + if (dentry->d_sb == sb + && atomic_read(&dentry->d_count) + && (!test || test(dentry, arg))) { + err = au_dpages_append(dpages, dentry, GFP_ATOMIC); + if (unlikely(err)) + goto out; + } + } + + if (this_parent != root) { + next = this_parent->d_u.d_child.next; + this_parent = this_parent->d_parent; /* dcache_lock is locked */ + goto resume; + } + out: + spin_unlock(&dcache_lock); + return err; +} + +int au_dcsub_pages_rev(struct au_dcsub_pages *dpages, struct dentry *dentry, + int do_include, au_dpages_test test, void *arg) +{ + int err; + + err = 0; + spin_lock(&dcache_lock); + if (do_include && (!test || test(dentry, arg))) { + err = au_dpages_append(dpages, dentry, GFP_ATOMIC); + if (unlikely(err)) + goto out; + } + while (!IS_ROOT(dentry)) { + dentry = dentry->d_parent; /* dcache_lock is locked */ + if (!test || test(dentry, arg)) { + err = au_dpages_append(dpages, dentry, GFP_ATOMIC); + if (unlikely(err)) + break; + } + } + + out: + spin_unlock(&dcache_lock); + + return err; +} + +struct dentry *au_test_subdir(struct dentry *d1, struct dentry *d2) +{ + struct dentry *trap, **dentries; + int err, i, j; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + + trap = ERR_PTR(-ENOMEM); + err = au_dpages_init(&dpages, GFP_NOFS); + if (unlikely(err)) + goto out; + err = au_dcsub_pages_rev(&dpages, d1, /*do_include*/1, NULL, NULL); + if (unlikely(err)) + goto out_dpages; + + trap = d1; + for (i = 0; !err && i < dpages.ndpage; i++) { + dpage = dpages.dpages + i; + dentries = dpage->dentries; + for (j = 0; !err && j < dpage->ndentry; j++) { + struct dentry *d; + + d = dentries[j]; + err = (d == d2); + if (!err) + trap = d; + } + } + if (!err) + trap = NULL; + + out_dpages: + au_dpages_free(&dpages); + out: + return trap; +} diff -urN ./fs/aufs/dcsub.h ./fs/aufs/dcsub.h --- ./fs/aufs/dcsub.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dcsub.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,43 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sub-routines for dentry cache + */ + +#ifndef __AUFS_DCSUB_H__ +#define __AUFS_DCSUB_H__ + +#ifdef __KERNEL__ + +#include + +struct au_dpage { + int ndentry; + struct dentry **dentries; +}; + +struct au_dcsub_pages { + int ndpage; + struct au_dpage *dpages; +}; + +/* ---------------------------------------------------------------------- */ + +int au_dpages_init(struct au_dcsub_pages *dpages, gfp_t gfp); +void au_dpages_free(struct au_dcsub_pages *dpages); +typedef int (*au_dpages_test)(struct dentry *dentry, void *arg); +int au_dcsub_pages(struct au_dcsub_pages *dpages, struct dentry *root, + au_dpages_test test, void *arg); +int au_dcsub_pages_rev(struct au_dcsub_pages *dpages, struct dentry *dentry, + int do_include, au_dpages_test test, void *arg); +struct dentry *au_test_subdir(struct dentry *d1, struct dentry *d2); + +#endif /* __KERNEL__ */ +#endif /* __AUFS_DCSUB_H__ */ diff -urN ./fs/aufs/debug.c ./fs/aufs/debug.c --- ./fs/aufs/debug.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/debug.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,414 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * debug print functions + */ + +#include "aufs.h" + +int aufs_debug; +MODULE_PARM_DESC(debug, "debug print"); +module_param_named(debug, aufs_debug, int, S_IRUGO | S_IWUSR | S_IWGRP); + +char *au_plevel = KERN_DEBUG; +#define dpri(fmt, arg...) do { \ + if (au_debug_test()) \ + printk("%s" fmt, au_plevel, ##arg); \ +} while (0) + +/* ---------------------------------------------------------------------- */ + +void au_dpri_whlist(struct au_nhash *whlist) +{ + int i; + struct hlist_head *head; + struct au_vdir_wh *tpos; + struct hlist_node *pos; + + for (i = 0; i < AuSize_NHASH; i++) { + head = whlist->heads + i; + hlist_for_each_entry(tpos, pos, head, wh_hash) + dpri("b%d, %.*s, %d\n", + tpos->wh_bindex, + tpos->wh_str.len, tpos->wh_str.name, + tpos->wh_str.len); + } +} + +void au_dpri_vdir(struct au_vdir *vdir) +{ + int i; + union au_vdir_deblk_p p; + unsigned char *o; + + if (!vdir || IS_ERR(vdir)) { + dpri("err %ld\n", PTR_ERR(vdir)); + return; + } + + dpri("nblk %d, deblk %p, last{%d, %p}, ver %lu\n", + vdir->vd_nblk, vdir->vd_deblk, + vdir->vd_last.i, vdir->vd_last.p.p, vdir->vd_version); + for (i = 0; i < vdir->vd_nblk; i++) { + p.deblk = vdir->vd_deblk[i]; + o = p.p; + dpri("[%d]: %p\n", i, o); + } +} + +static int do_pri_inode(aufs_bindex_t bindex, struct inode *inode, + struct dentry *wh) +{ + char *n = NULL; + int l = 0; + + if (!inode || IS_ERR(inode)) { + dpri("i%d: err %ld\n", bindex, PTR_ERR(inode)); + return -1; + } + + /* the type of i_blocks depends upon CONFIG_LSF */ + BUILD_BUG_ON(sizeof(inode->i_blocks) != sizeof(unsigned long) + && sizeof(inode->i_blocks) != sizeof(u64)); + if (wh) { + n = (void *)wh->d_name.name; + l = wh->d_name.len; + } + + dpri("i%d: i%lu, %s, cnt %d, nl %u, 0%o, sz %llu, blk %llu," + " ct %lld, np %lu, st 0x%lx, f 0x%x, g %x%s%.*s\n", + bindex, + inode->i_ino, inode->i_sb ? au_sbtype(inode->i_sb) : "??", + atomic_read(&inode->i_count), inode->i_nlink, inode->i_mode, + i_size_read(inode), (unsigned long long)inode->i_blocks, + (long long)timespec_to_ns(&inode->i_ctime) & 0x0ffff, + inode->i_mapping ? inode->i_mapping->nrpages : 0, + inode->i_state, inode->i_flags, inode->i_generation, + l ? ", wh " : "", l, n); + return 0; +} + +void au_dpri_inode(struct inode *inode) +{ + struct au_iinfo *iinfo; + aufs_bindex_t bindex; + int err; + + err = do_pri_inode(-1, inode, NULL); + if (err || !au_test_aufs(inode->i_sb)) + return; + + iinfo = au_ii(inode); + if (!iinfo) + return; + dpri("i-1: bstart %d, bend %d, gen %d\n", + iinfo->ii_bstart, iinfo->ii_bend, au_iigen(inode)); + if (iinfo->ii_bstart < 0) + return; + for (bindex = iinfo->ii_bstart; bindex <= iinfo->ii_bend; bindex++) + do_pri_inode(bindex, iinfo->ii_hinode[0 + bindex].hi_inode, + iinfo->ii_hinode[0 + bindex].hi_whdentry); +} + +static int do_pri_dentry(aufs_bindex_t bindex, struct dentry *dentry) +{ + struct dentry *wh = NULL; + + if (!dentry || IS_ERR(dentry)) { + dpri("d%d: err %ld\n", bindex, PTR_ERR(dentry)); + return -1; + } + /* do not call dget_parent() here */ + dpri("d%d: %.*s?/%.*s, %s, cnt %d, flags 0x%x\n", + bindex, + AuDLNPair(dentry->d_parent), AuDLNPair(dentry), + dentry->d_sb ? au_sbtype(dentry->d_sb) : "??", + atomic_read(&dentry->d_count), dentry->d_flags); + if (bindex >= 0 && dentry->d_inode && au_test_aufs(dentry->d_sb)) { + struct au_iinfo *iinfo = au_ii(dentry->d_inode); + if (iinfo) + wh = iinfo->ii_hinode[0 + bindex].hi_whdentry; + } + do_pri_inode(bindex, dentry->d_inode, wh); + return 0; +} + +void au_dpri_dentry(struct dentry *dentry) +{ + struct au_dinfo *dinfo; + aufs_bindex_t bindex; + int err; + + err = do_pri_dentry(-1, dentry); + if (err || !au_test_aufs(dentry->d_sb)) + return; + + dinfo = au_di(dentry); + if (!dinfo) + return; + dpri("d-1: bstart %d, bend %d, bwh %d, bdiropq %d, gen %d\n", + dinfo->di_bstart, dinfo->di_bend, + dinfo->di_bwh, dinfo->di_bdiropq, au_digen(dentry)); + if (dinfo->di_bstart < 0) + return; + for (bindex = dinfo->di_bstart; bindex <= dinfo->di_bend; bindex++) + do_pri_dentry(bindex, dinfo->di_hdentry[0 + bindex].hd_dentry); +} + +static int do_pri_file(aufs_bindex_t bindex, struct file *file) +{ + char a[32]; + + if (!file || IS_ERR(file)) { + dpri("f%d: err %ld\n", bindex, PTR_ERR(file)); + return -1; + } + a[0] = 0; + if (bindex < 0 + && file->f_dentry + && au_test_aufs(file->f_dentry->d_sb) + && au_fi(file)) + snprintf(a, sizeof(a), ", mmapped %d", au_test_mmapped(file)); + dpri("f%d: mode 0x%x, flags 0%o, cnt %ld, pos %llu%s\n", + bindex, file->f_mode, file->f_flags, (long)file_count(file), + file->f_pos, a); + if (file->f_dentry) + do_pri_dentry(bindex, file->f_dentry); + return 0; +} + +void au_dpri_file(struct file *file) +{ + struct au_finfo *finfo; + aufs_bindex_t bindex; + int err; + + err = do_pri_file(-1, file); + if (err || !file->f_dentry || !au_test_aufs(file->f_dentry->d_sb)) + return; + + finfo = au_fi(file); + if (!finfo) + return; + if (finfo->fi_bstart < 0) + return; + for (bindex = finfo->fi_bstart; bindex <= finfo->fi_bend; bindex++) { + struct au_hfile *hf; + + hf = finfo->fi_hfile + bindex; + do_pri_file(bindex, hf ? hf->hf_file : NULL); + } +} + +static int do_pri_br(aufs_bindex_t bindex, struct au_branch *br) +{ + struct vfsmount *mnt; + struct super_block *sb; + + if (!br || IS_ERR(br)) + goto out; + mnt = br->br_mnt; + if (!mnt || IS_ERR(mnt)) + goto out; + sb = mnt->mnt_sb; + if (!sb || IS_ERR(sb)) + goto out; + + dpri("s%d: {perm 0x%x, cnt %d, wbr %p}, " + "%s, dev 0x%02x%02x, flags 0x%lx, cnt(BIAS) %d, active %d, " + "xino %d\n", + bindex, br->br_perm, atomic_read(&br->br_count), br->br_wbr, + au_sbtype(sb), MAJOR(sb->s_dev), MINOR(sb->s_dev), + sb->s_flags, sb->s_count - S_BIAS, + atomic_read(&sb->s_active), !!br->br_xino.xi_file); + return 0; + + out: + dpri("s%d: err %ld\n", bindex, PTR_ERR(br)); + return -1; +} + +void au_dpri_sb(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + aufs_bindex_t bindex; + int err; + /* to reuduce stack size */ + struct { + struct vfsmount mnt; + struct au_branch fake; + } *a; + + /* this function can be called from magic sysrq */ + a = kzalloc(sizeof(*a), GFP_ATOMIC); + if (unlikely(!a)) { + dpri("no memory\n"); + return; + } + + a->mnt.mnt_sb = sb; + a->fake.br_perm = 0; + a->fake.br_mnt = &a->mnt; + a->fake.br_xino.xi_file = NULL; + atomic_set(&a->fake.br_count, 0); + smp_mb(); /* atomic_set */ + err = do_pri_br(-1, &a->fake); + kfree(a); + dpri("dev 0x%x\n", sb->s_dev); + if (err || !au_test_aufs(sb)) + return; + + sbinfo = au_sbi(sb); + if (!sbinfo) + return; + dpri("nw %d, gen %u, kobj %d\n", + atomic_read(&sbinfo->si_nowait.nw_len), sbinfo->si_generation, + atomic_read(&sbinfo->si_kobj.kref.refcount)); + for (bindex = 0; bindex <= sbinfo->si_bend; bindex++) + do_pri_br(bindex, sbinfo->si_branch[0 + bindex]); +} + +/* ---------------------------------------------------------------------- */ + +void au_dbg_sleep_jiffy(int jiffy) +{ + while (jiffy) + jiffy = schedule_timeout_uninterruptible(jiffy); +} + +void au_dbg_iattr(struct iattr *ia) +{ +#define AuBit(name) if (ia->ia_valid & ATTR_ ## name) \ + dpri(#name "\n") + AuBit(MODE); + AuBit(UID); + AuBit(GID); + AuBit(SIZE); + AuBit(ATIME); + AuBit(MTIME); + AuBit(CTIME); + AuBit(ATIME_SET); + AuBit(MTIME_SET); + AuBit(FORCE); + AuBit(ATTR_FLAG); + AuBit(KILL_SUID); + AuBit(KILL_SGID); + AuBit(FILE); + AuBit(KILL_PRIV); + AuBit(OPEN); + AuBit(TIMES_SET); +#undef AuBit + dpri("ia_file %p\n", ia->ia_file); +} + +/* ---------------------------------------------------------------------- */ + +void au_dbg_verify_dir_parent(struct dentry *dentry, unsigned int sigen) +{ + struct dentry *parent; + + parent = dget_parent(dentry); + AuDebugOn(!S_ISDIR(dentry->d_inode->i_mode) + || IS_ROOT(dentry) + || au_digen(parent) != sigen); + dput(parent); +} + +void au_dbg_verify_nondir_parent(struct dentry *dentry, unsigned int sigen) +{ + struct dentry *parent; + + parent = dget_parent(dentry); + AuDebugOn(S_ISDIR(dentry->d_inode->i_mode) + || au_digen(parent) != sigen); + dput(parent); +} + +void au_dbg_verify_gen(struct dentry *parent, unsigned int sigen) +{ + int err, i, j; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + struct dentry **dentries; + + err = au_dpages_init(&dpages, GFP_NOFS); + AuDebugOn(err); + err = au_dcsub_pages_rev(&dpages, parent, /*do_include*/1, NULL, NULL); + AuDebugOn(err); + for (i = dpages.ndpage - 1; !err && i >= 0; i--) { + dpage = dpages.dpages + i; + dentries = dpage->dentries; + for (j = dpage->ndentry - 1; !err && j >= 0; j--) + AuDebugOn(au_digen(dentries[j]) != sigen); + } + au_dpages_free(&dpages); +} + +void au_dbg_verify_hf(struct au_finfo *finfo) +{ + struct au_hfile *hf; + aufs_bindex_t bend, bindex; + + if (finfo->fi_bstart >= 0) { + bend = finfo->fi_bend; + for (bindex = finfo->fi_bstart; bindex <= bend; bindex++) { + hf = finfo->fi_hfile + bindex; + AuDebugOn(hf->hf_file || hf->hf_br); + } + } +} + +void au_dbg_verify_kthread(void) +{ + if (au_test_wkq(current)) { + au_dbg_blocked(); + BUG(); + } +} + +/* ---------------------------------------------------------------------- */ + +void au_debug_sbinfo_init(struct au_sbinfo *sbinfo __maybe_unused) +{ +#ifdef AuForceNoPlink + au_opt_clr(sbinfo->si_mntflags, PLINK); +#endif +#ifdef AuForceNoXino + au_opt_clr(sbinfo->si_mntflags, XINO); +#endif +#ifdef AuForceNoRefrof + au_opt_clr(sbinfo->si_mntflags, REFROF); +#endif +#ifdef AuForceHinotify + au_opt_set_udba(sbinfo->si_mntflags, UDBA_HINOTIFY); +#endif +} + +int __init au_debug_init(void) +{ + aufs_bindex_t bindex; + struct au_vdir_destr destr; + + bindex = -1; + AuDebugOn(bindex >= 0); + + destr.len = -1; + AuDebugOn(destr.len < NAME_MAX); + +#ifdef CONFIG_4KSTACKS + AuWarn("CONFIG_4KSTACKS is defined.\n"); +#endif + +#ifdef AuForceNoBrs + sysaufs_brs = 0; +#endif + + return 0; +} diff -urN ./fs/aufs/debug.h ./fs/aufs/debug.h --- ./fs/aufs/debug.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/debug.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,243 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * debug print functions + */ + +#ifndef __AUFS_DEBUG_H__ +#define __AUFS_DEBUG_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_AUFS_DEBUG +#define AuDebugOn(a) BUG_ON(a) + +/* module parameter */ +extern int aufs_debug; +static inline void au_debug(int n) +{ + aufs_debug = n; + smp_mb(); +} + +static inline int au_debug_test(void) +{ + return aufs_debug; +} +#else +#define AuDebugOn(a) do {} while (0) +#define au_debug() do {} while (0) +static inline int au_debug_test(void) +{ + return 0; +} +#endif /* CONFIG_AUFS_DEBUG */ + +/* ---------------------------------------------------------------------- */ + +/* debug print */ + +#define AuDpri(lvl, fmt, arg...) \ + printk(lvl AUFS_NAME " %s:%d:%s[%d]: " fmt, \ + __func__, __LINE__, current->comm, current->pid, ##arg) +#define AuDbg(fmt, arg...) do { \ + if (au_debug_test()) \ + AuDpri(KERN_DEBUG, fmt, ##arg); \ +} while (0) +#define AuLabel(l) AuDbg(#l "\n") +#define AuInfo(fmt, arg...) AuDpri(KERN_INFO, fmt, ##arg) +#define AuWarn(fmt, arg...) AuDpri(KERN_WARNING, fmt, ##arg) +#define AuErr(fmt, arg...) AuDpri(KERN_ERR, fmt, ##arg) +#define AuIOErr(fmt, arg...) AuErr("I/O Error, " fmt, ##arg) +#define AuWarn1(fmt, arg...) do { \ + static unsigned char _c; \ + if (!_c++) \ + AuWarn(fmt, ##arg); \ +} while (0) + +#define AuErr1(fmt, arg...) do { \ + static unsigned char _c; \ + if (!_c++) \ + AuErr(fmt, ##arg); \ +} while (0) + +#define AuIOErr1(fmt, arg...) do { \ + static unsigned char _c; \ + if (!_c++) \ + AuIOErr(fmt, ##arg); \ +} while (0) + +#define AuUnsupportMsg "This operation is not supported." \ + " Please report this application to aufs-users ML." +#define AuUnsupport(fmt, args...) do { \ + AuErr(AuUnsupportMsg "\n" fmt, ##args); \ + dump_stack(); \ +} while (0) + +#define AuTraceErr(e) do { \ + if (unlikely((e) < 0)) \ + AuDbg("err %d\n", (int)(e)); \ +} while (0) + +#define AuTraceErrPtr(p) do { \ + if (IS_ERR(p)) \ + AuDbg("err %ld\n", PTR_ERR(p)); \ +} while (0) + +/* dirty macros for debug print, use with "%.*s" and caution */ +#define AuLNPair(qstr) (qstr)->len, (qstr)->name +#define AuDLNPair(d) AuLNPair(&(d)->d_name) + +/* ---------------------------------------------------------------------- */ + +struct au_sbinfo; +struct au_finfo; +#ifdef CONFIG_AUFS_DEBUG +extern char *au_plevel; +struct au_nhash; +void au_dpri_whlist(struct au_nhash *whlist); +struct au_vdir; +void au_dpri_vdir(struct au_vdir *vdir); +void au_dpri_inode(struct inode *inode); +void au_dpri_dentry(struct dentry *dentry); +void au_dpri_file(struct file *filp); +void au_dpri_sb(struct super_block *sb); + +void au_dbg_sleep_jiffy(int jiffy); +void au_dbg_iattr(struct iattr *ia); + +void au_dbg_verify_dir_parent(struct dentry *dentry, unsigned int sigen); +void au_dbg_verify_nondir_parent(struct dentry *dentry, unsigned int sigen); +void au_dbg_verify_gen(struct dentry *parent, unsigned int sigen); +void au_dbg_verify_hf(struct au_finfo *finfo); +void au_dbg_verify_kthread(void); + +int __init au_debug_init(void); +void au_debug_sbinfo_init(struct au_sbinfo *sbinfo); +#define AuDbgWhlist(w) do { \ + AuDbg(#w "\n"); \ + au_dpri_whlist(w); \ +} while (0) + +#define AuDbgVdir(v) do { \ + AuDbg(#v "\n"); \ + au_dpri_vdir(v); \ +} while (0) + +#define AuDbgInode(i) do { \ + AuDbg(#i "\n"); \ + au_dpri_inode(i); \ +} while (0) + +#define AuDbgDentry(d) do { \ + AuDbg(#d "\n"); \ + au_dpri_dentry(d); \ +} while (0) + +#define AuDbgFile(f) do { \ + AuDbg(#f "\n"); \ + au_dpri_file(f); \ +} while (0) + +#define AuDbgSb(sb) do { \ + AuDbg(#sb "\n"); \ + au_dpri_sb(sb); \ +} while (0) + +#define AuDbgSleep(sec) do { \ + AuDbg("sleep %d sec\n", sec); \ + ssleep(sec); \ +} while (0) + +#define AuDbgSleepJiffy(jiffy) do { \ + AuDbg("sleep %d jiffies\n", jiffy); \ + au_dbg_sleep_jiffy(jiffy); \ +} while (0) + +#define AuDbgIAttr(ia) do { \ + AuDbg("ia_valid 0x%x\n", (ia)->ia_valid); \ + au_dbg_iattr(ia); \ +} while (0) +#else +static inline void au_dbg_verify_dir_parent(struct dentry *dentry, + unsigned int sigen) +{ + /* empty */ +} +static inline void au_dbg_verify_nondir_parent(struct dentry *dentry, + unsigned int sigen) +{ + /* empty */ +} +static inline void au_dbg_verify_gen(struct dentry *parent, unsigned int sigen) +{ + /* empty */ +} +static inline void au_dbg_verify_hf(struct au_finfo *finfo) +{ + /* empty */ +} +static inline void au_dbg_verify_kthread(void) +{ + /* empty */ +} + +static inline int au_debug_init(void) +{ + return 0; +} +static inline void au_debug_sbinfo_init(struct au_sbinfo *sbinfo) +{ + /* empty */ +} +#define AuDbgWhlist(w) do {} while (0) +#define AuDbgVdir(v) do {} while (0) +#define AuDbgInode(i) do {} while (0) +#define AuDbgDentry(d) do {} while (0) +#define AuDbgFile(f) do {} while (0) +#define AuDbgSb(sb) do {} while (0) +#define AuDbgSleep(sec) do {} while (0) +#define AuDbgSleepJiffy(jiffy) do {} while (0) +#define AuDbgIAttr(ia) do {} while (0) +#endif /* CONFIG_AUFS_DEBUG */ + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_MAGIC_SYSRQ +int __init au_sysrq_init(void); +void au_sysrq_fin(void); + +#ifdef CONFIG_HW_CONSOLE +#define au_dbg_blocked() do { \ + WARN_ON(1); \ + handle_sysrq('w', vc_cons[fg_console].d->vc_tty); \ +} while (0) +#else +#define au_dbg_blocked() do {} while (0) +#endif + +#else +static inline int au_sysrq_init(void) +{ + return 0; +} +#define au_sysrq_fin() do {} while (0) +#define au_dbg_blocked() do {} while (0) +#endif /* CONFIG_AUFS_MAGIC_SYSRQ */ + +#endif /* __KERNEL__ */ +#endif /* __AUFS_DEBUG_H__ */ diff -urN ./fs/aufs/dentry.c ./fs/aufs/dentry.c --- ./fs/aufs/dentry.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dentry.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,858 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * lookup and dentry operations + */ + +#include "aufs.h" + +static void au_h_nd(struct nameidata *h_nd, struct nameidata *nd) +{ + if (nd) { + *h_nd = *nd; + + /* + * gave up supporting LOOKUP_CREATE/OPEN for lower fs, + * due to whiteout and branch permission. + */ + h_nd->flags &= ~(/*LOOKUP_PARENT |*/ LOOKUP_OPEN | LOOKUP_CREATE + | LOOKUP_FOLLOW); + /* unnecessary? */ + h_nd->intent.open.file = NULL; + } else + memset(h_nd, 0, sizeof(*h_nd)); +} + +struct au_lkup_one_args { + struct dentry **errp; + struct qstr *name; + struct dentry *h_parent; + struct au_branch *br; + struct nameidata *nd; +}; + +struct dentry *au_lkup_one(struct qstr *name, struct dentry *h_parent, + struct au_branch *br, struct nameidata *nd) +{ + struct dentry *h_dentry; + int err; + struct nameidata h_nd; + + if (au_test_fs_null_nd(h_parent->d_sb)) + return vfsub_lookup_one_len(name->name, h_parent, name->len); + + au_h_nd(&h_nd, nd); + h_nd.path.dentry = h_parent; + h_nd.path.mnt = br->br_mnt; + + err = __lookup_one_len(name->name, &h_nd.last, NULL, name->len); + h_dentry = ERR_PTR(err); + if (!err) { + path_get(&h_nd.path); + h_dentry = vfsub_lookup_hash(&h_nd); + path_put(&h_nd.path); + } + + return h_dentry; +} + +static void au_call_lkup_one(void *args) +{ + struct au_lkup_one_args *a = args; + *a->errp = au_lkup_one(a->name, a->h_parent, a->br, a->nd); +} + +#define AuLkup_ALLOW_NEG 1 +#define au_ftest_lkup(flags, name) ((flags) & AuLkup_##name) +#define au_fset_lkup(flags, name) { (flags) |= AuLkup_##name; } +#define au_fclr_lkup(flags, name) { (flags) &= ~AuLkup_##name; } + +struct au_do_lookup_args { + unsigned int flags; + mode_t type; + struct nameidata *nd; +}; + +/* + * returns positive/negative dentry, NULL or an error. + * NULL means whiteout-ed or not-found. + */ +static struct dentry* +au_do_lookup(struct dentry *h_parent, struct dentry *dentry, + aufs_bindex_t bindex, struct qstr *wh_name, + struct au_do_lookup_args *args) +{ + struct dentry *h_dentry; + struct inode *h_inode, *inode; + struct qstr *name; + struct au_branch *br; + int wh_found, opq; + unsigned char wh_able; + const unsigned char allow_neg = !!au_ftest_lkup(args->flags, ALLOW_NEG); + + name = &dentry->d_name; + wh_found = 0; + br = au_sbr(dentry->d_sb, bindex); + wh_able = !!au_br_whable(br->br_perm); + if (wh_able) + wh_found = au_wh_test(h_parent, wh_name, br, /*try_sio*/0); + h_dentry = ERR_PTR(wh_found); + if (!wh_found) + goto real_lookup; + if (unlikely(wh_found < 0)) + goto out; + + /* We found a whiteout */ + /* au_set_dbend(dentry, bindex); */ + au_set_dbwh(dentry, bindex); + if (!allow_neg) + return NULL; /* success */ + + real_lookup: + h_dentry = au_lkup_one(name, h_parent, br, args->nd); + if (IS_ERR(h_dentry)) + goto out; + + h_inode = h_dentry->d_inode; + if (!h_inode) { + if (!allow_neg) + goto out_neg; + } else if (wh_found + || (args->type && args->type != (h_inode->i_mode & S_IFMT))) + goto out_neg; + + if (au_dbend(dentry) <= bindex) + au_set_dbend(dentry, bindex); + if (au_dbstart(dentry) < 0 || bindex < au_dbstart(dentry)) + au_set_dbstart(dentry, bindex); + au_set_h_dptr(dentry, bindex, h_dentry); + + inode = dentry->d_inode; + if (!h_inode || !S_ISDIR(h_inode->i_mode) || !wh_able + || (inode && !S_ISDIR(inode->i_mode))) + goto out; /* success */ + + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + opq = au_diropq_test(h_dentry, br); + mutex_unlock(&h_inode->i_mutex); + if (opq > 0) + au_set_dbdiropq(dentry, bindex); + else if (unlikely(opq < 0)) { + au_set_h_dptr(dentry, bindex, NULL); + h_dentry = ERR_PTR(opq); + } + goto out; + + out_neg: + dput(h_dentry); + h_dentry = NULL; + out: + return h_dentry; +} + +/* + * returns the number of lower positive dentries, + * otherwise an error. + * can be called at unlinking with @type is zero. + */ +int au_lkup_dentry(struct dentry *dentry, aufs_bindex_t bstart, mode_t type, + struct nameidata *nd) +{ + int npositive, err; + aufs_bindex_t bindex, btail, bdiropq; + unsigned char isdir; + struct qstr whname; + struct au_do_lookup_args args = { + .flags = 0, + .type = type, + .nd = nd + }; + const struct qstr *name = &dentry->d_name; + struct dentry *parent; + struct inode *inode; + + err = -EPERM; + parent = dget_parent(dentry); + if (unlikely(!strncmp(name->name, AUFS_WH_PFX, AUFS_WH_PFX_LEN))) + goto out; + + err = au_wh_name_alloc(&whname, name); + if (unlikely(err)) + goto out; + + inode = dentry->d_inode; + isdir = !!(inode && S_ISDIR(inode->i_mode)); + if (!type) + au_fset_lkup(args.flags, ALLOW_NEG); + + npositive = 0; + btail = au_dbtaildir(parent); + for (bindex = bstart; bindex <= btail; bindex++) { + struct dentry *h_parent, *h_dentry; + struct inode *h_inode, *h_dir; + + h_dentry = au_h_dptr(dentry, bindex); + if (h_dentry) { + if (h_dentry->d_inode) + npositive++; + if (type != S_IFDIR) + break; + continue; + } + h_parent = au_h_dptr(parent, bindex); + if (!h_parent) + continue; + h_dir = h_parent->d_inode; + if (!h_dir || !S_ISDIR(h_dir->i_mode)) + continue; + + mutex_lock_nested(&h_dir->i_mutex, AuLsc_I_PARENT); + h_dentry = au_do_lookup(h_parent, dentry, bindex, &whname, + &args); + mutex_unlock(&h_dir->i_mutex); + err = PTR_ERR(h_dentry); + if (IS_ERR(h_dentry)) + goto out_wh; + au_fclr_lkup(args.flags, ALLOW_NEG); + + if (au_dbwh(dentry) >= 0) + break; + if (!h_dentry) + continue; + h_inode = h_dentry->d_inode; + if (!h_inode) + continue; + npositive++; + if (!args.type) + args.type = h_inode->i_mode & S_IFMT; + if (args.type != S_IFDIR) + break; + else if (isdir) { + /* the type of lower may be different */ + bdiropq = au_dbdiropq(dentry); + if (bdiropq >= 0 && bdiropq <= bindex) + break; + } + } + + if (npositive) { + AuLabel(positive); + au_update_dbstart(dentry); + } + err = npositive; + if (unlikely(!au_opt_test(au_mntflags(dentry->d_sb), UDBA_NONE) + && au_dbstart(dentry) < 0)) + /* both of real entry and whiteout found */ + err = -EIO; + + out_wh: + kfree(whname.name); + out: + dput(parent); + return err; +} + +struct dentry *au_sio_lkup_one(struct qstr *name, struct dentry *parent, + struct au_branch *br) +{ + struct dentry *dentry; + int wkq_err; + + if (!au_test_h_perm_sio(parent->d_inode, MAY_EXEC)) + dentry = au_lkup_one(name, parent, br, /*nd*/NULL); + else { + struct au_lkup_one_args args = { + .errp = &dentry, + .name = name, + .h_parent = parent, + .br = br, + .nd = NULL + }; + + wkq_err = au_wkq_wait(au_call_lkup_one, &args); + if (unlikely(wkq_err)) + dentry = ERR_PTR(wkq_err); + } + + return dentry; +} + +/* + * lookup @dentry on @bindex which should be negative. + */ +int au_lkup_neg(struct dentry *dentry, aufs_bindex_t bindex) +{ + int err; + struct dentry *parent, *h_parent, *h_dentry; + struct qstr *name; + + name = &dentry->d_name; + parent = dget_parent(dentry); + h_parent = au_h_dptr(parent, bindex); + h_dentry = au_sio_lkup_one(name, h_parent, + au_sbr(dentry->d_sb, bindex)); + err = PTR_ERR(h_dentry); + if (IS_ERR(h_dentry)) + goto out; + if (unlikely(h_dentry->d_inode)) { + err = -EIO; + AuIOErr("b%d %.*s should be negative.\n", + bindex, AuDLNPair(h_dentry)); + dput(h_dentry); + goto out; + } + + if (bindex < au_dbstart(dentry)) + au_set_dbstart(dentry, bindex); + if (au_dbend(dentry) < bindex) + au_set_dbend(dentry, bindex); + au_set_h_dptr(dentry, bindex, h_dentry); + err = 0; + + out: + dput(parent); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* subset of struct inode */ +struct au_iattr { + unsigned long i_ino; + /* unsigned int i_nlink; */ + uid_t i_uid; + gid_t i_gid; + u64 i_version; +/* + loff_t i_size; + blkcnt_t i_blocks; +*/ + umode_t i_mode; +}; + +static void au_iattr_save(struct au_iattr *ia, struct inode *h_inode) +{ + ia->i_ino = h_inode->i_ino; + /* ia->i_nlink = h_inode->i_nlink; */ + ia->i_uid = h_inode->i_uid; + ia->i_gid = h_inode->i_gid; + ia->i_version = h_inode->i_version; +/* + ia->i_size = h_inode->i_size; + ia->i_blocks = h_inode->i_blocks; +*/ + ia->i_mode = (h_inode->i_mode & S_IFMT); +} + +static int au_iattr_test(struct au_iattr *ia, struct inode *h_inode) +{ + return ia->i_ino != h_inode->i_ino + /* || ia->i_nlink != h_inode->i_nlink */ + || ia->i_uid != h_inode->i_uid + || ia->i_gid != h_inode->i_gid + || ia->i_version != h_inode->i_version +/* + || ia->i_size != h_inode->i_size + || ia->i_blocks != h_inode->i_blocks +*/ + || ia->i_mode != (h_inode->i_mode & S_IFMT); +} + +static int au_h_verify_dentry(struct dentry *h_dentry, struct dentry *h_parent, + struct au_branch *br) +{ + int err; + struct au_iattr ia; + struct inode *h_inode; + struct dentry *h_d; + struct super_block *h_sb; + + err = 0; + memset(&ia, -1, sizeof(ia)); + h_sb = h_dentry->d_sb; + h_inode = h_dentry->d_inode; + if (h_inode) + au_iattr_save(&ia, h_inode); + else if (au_test_nfs(h_sb) || au_test_fuse(h_sb)) + /* nfs d_revalidate may return 0 for negative dentry */ + /* fuse d_revalidate always return 0 for negative dentry */ + goto out; + + /* main purpose is namei.c:cached_lookup() and d_revalidate */ + h_d = au_lkup_one(&h_dentry->d_name, h_parent, br, /*nd*/NULL); + err = PTR_ERR(h_d); + if (IS_ERR(h_d)) + goto out; + + err = 0; + if (unlikely(h_d != h_dentry + || h_d->d_inode != h_inode + || (h_inode && au_iattr_test(&ia, h_inode)))) + err = au_busy_or_stale(); + dput(h_d); + + out: + AuTraceErr(err); + return err; +} + +int au_h_verify(struct dentry *h_dentry, unsigned int udba, struct inode *h_dir, + struct dentry *h_parent, struct au_branch *br) +{ + int err; + + err = 0; + if (udba == AuOpt_UDBA_REVAL) { + IMustLock(h_dir); + err = (h_dentry->d_parent->d_inode != h_dir); + } else if (udba == AuOpt_UDBA_HINOTIFY) + err = au_h_verify_dentry(h_dentry, h_parent, br); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +static void au_do_refresh_hdentry(struct au_hdentry *p, struct au_dinfo *dinfo, + struct dentry *parent) +{ + struct dentry *h_d, *h_dp; + struct au_hdentry tmp, *q; + struct super_block *sb; + aufs_bindex_t new_bindex, bindex, bend, bwh, bdiropq; + + bend = dinfo->di_bend; + bwh = dinfo->di_bwh; + bdiropq = dinfo->di_bdiropq; + for (bindex = dinfo->di_bstart; bindex <= bend; bindex++, p++) { + h_d = p->hd_dentry; + if (!h_d) + continue; + + h_dp = dget_parent(h_d); + if (h_dp == au_h_dptr(parent, bindex)) { + dput(h_dp); + continue; + } + + new_bindex = au_find_dbindex(parent, h_dp); + dput(h_dp); + if (dinfo->di_bwh == bindex) + bwh = new_bindex; + if (dinfo->di_bdiropq == bindex) + bdiropq = new_bindex; + if (new_bindex < 0) { + au_hdput(p); + p->hd_dentry = NULL; + continue; + } + + /* swap two lower dentries, and loop again */ + q = dinfo->di_hdentry + new_bindex; + tmp = *q; + *q = *p; + *p = tmp; + if (tmp.hd_dentry) { + bindex--; + p--; + } + } + + sb = parent->d_sb; + dinfo->di_bwh = -1; + if (bwh >= 0 && bwh <= au_sbend(sb) && au_sbr_whable(sb, bwh)) + dinfo->di_bwh = bwh; + + dinfo->di_bdiropq = -1; + if (bdiropq >= 0 + && bdiropq <= au_sbend(sb) + && au_sbr_whable(sb, bdiropq)) + dinfo->di_bdiropq = bdiropq; + + bend = au_dbend(parent); + p = dinfo->di_hdentry; + for (bindex = 0; bindex <= bend; bindex++, p++) + if (p->hd_dentry) { + dinfo->di_bstart = bindex; + break; + } + + p = dinfo->di_hdentry + bend; + for (bindex = bend; bindex >= 0; bindex--, p--) + if (p->hd_dentry) { + dinfo->di_bend = bindex; + break; + } +} + +/* + * returns the number of found lower positive dentries, + * otherwise an error. + */ +int au_refresh_hdentry(struct dentry *dentry, mode_t type) +{ + int npositive, err; + unsigned int sigen; + aufs_bindex_t bstart; + struct au_dinfo *dinfo; + struct super_block *sb; + struct dentry *parent; + + sb = dentry->d_sb; + AuDebugOn(IS_ROOT(dentry)); + sigen = au_sigen(sb); + parent = dget_parent(dentry); + AuDebugOn(au_digen(parent) != sigen + || au_iigen(parent->d_inode) != sigen); + + dinfo = au_di(dentry); + err = au_di_realloc(dinfo, au_sbend(sb) + 1); + npositive = err; + if (unlikely(err)) + goto out; + au_do_refresh_hdentry(dinfo->di_hdentry + dinfo->di_bstart, dinfo, + parent); + + npositive = 0; + bstart = au_dbstart(parent); + if (type != S_IFDIR && dinfo->di_bstart == bstart) + goto out_dgen; /* success */ + + npositive = au_lkup_dentry(dentry, bstart, type, /*nd*/NULL); + if (npositive < 0) + goto out; + if (dinfo->di_bwh >= 0 && dinfo->di_bwh <= dinfo->di_bstart) + d_drop(dentry); + + out_dgen: + au_update_digen(dentry); + out: + dput(parent); + AuTraceErr(npositive); + return npositive; +} + +static noinline_for_stack +int au_do_h_d_reval(struct dentry *h_dentry, struct nameidata *nd, + struct dentry *dentry, aufs_bindex_t bindex) +{ + int err, valid; + int (*reval)(struct dentry *, struct nameidata *); + + err = 0; + reval = NULL; + if (h_dentry->d_op) + reval = h_dentry->d_op->d_revalidate; + if (!reval) + goto out; + + AuDbg("b%d\n", bindex); + if (au_test_fs_null_nd(h_dentry->d_sb)) + /* it may return tri-state */ + valid = reval(h_dentry, NULL); + else { + struct nameidata h_nd; + int locked; + struct dentry *parent; + + au_h_nd(&h_nd, nd); + parent = nd->path.dentry; + locked = (nd && nd->path.dentry != dentry); + if (locked) + di_read_lock_parent(parent, AuLock_IR); + BUG_ON(bindex > au_dbend(parent)); + h_nd.path.dentry = au_h_dptr(parent, bindex); + BUG_ON(!h_nd.path.dentry); + h_nd.path.mnt = au_sbr(parent->d_sb, bindex)->br_mnt; + path_get(&h_nd.path); + valid = reval(h_dentry, &h_nd); + path_put(&h_nd.path); + if (locked) + di_read_unlock(parent, AuLock_IR); + } + + if (unlikely(valid < 0)) + err = valid; + else if (!valid) + err = -EINVAL; + + out: + AuTraceErr(err); + return err; +} + +/* todo: remove this */ +static int h_d_revalidate(struct dentry *dentry, struct inode *inode, + struct nameidata *nd, int do_udba) +{ + int err; + umode_t mode, h_mode; + aufs_bindex_t bindex, btail, bstart, ibs, ibe; + unsigned char plus, unhashed, is_root, h_plus; + struct inode *first, *h_inode, *h_cached_inode; + struct dentry *h_dentry; + struct qstr *name, *h_name; + + err = 0; + plus = 0; + mode = 0; + first = NULL; + ibs = -1; + ibe = -1; + unhashed = !!d_unhashed(dentry); + is_root = !!IS_ROOT(dentry); + name = &dentry->d_name; + + /* + * Theoretically, REVAL test should be unnecessary in case of INOTIFY. + * But inotify doesn't fire some necessary events, + * IN_ATTRIB for atime/nlink/pageio + * IN_DELETE for NFS dentry + * Let's do REVAL test too. + */ + if (do_udba && inode) { + mode = (inode->i_mode & S_IFMT); + plus = (inode->i_nlink > 0); + first = au_h_iptr(inode, au_ibstart(inode)); + ibs = au_ibstart(inode); + ibe = au_ibend(inode); + } + + bstart = au_dbstart(dentry); + btail = bstart; + if (inode && S_ISDIR(inode->i_mode)) + btail = au_dbtaildir(dentry); + for (bindex = bstart; bindex <= btail; bindex++) { + h_dentry = au_h_dptr(dentry, bindex); + if (!h_dentry) + continue; + + AuDbg("b%d, %.*s\n", bindex, AuDLNPair(h_dentry)); + h_name = &h_dentry->d_name; + if (unlikely(do_udba + && !is_root + && (unhashed != !!d_unhashed(h_dentry) + || name->len != h_name->len + || memcmp(name->name, h_name->name, name->len)) + )) { + AuDbg("unhash 0x%x 0x%x, %.*s %.*s\n", + unhashed, d_unhashed(h_dentry), + AuDLNPair(dentry), AuDLNPair(h_dentry)); + goto err; + } + + err = au_do_h_d_reval(h_dentry, nd, dentry, bindex); + if (unlikely(err)) + /* do not goto err, to keep the errno */ + break; + + /* todo: plink too? */ + if (!do_udba) + continue; + + /* UDBA tests */ + h_inode = h_dentry->d_inode; + if (unlikely(!!inode != !!h_inode)) + goto err; + + h_plus = plus; + h_mode = mode; + h_cached_inode = h_inode; + if (h_inode) { + h_mode = (h_inode->i_mode & S_IFMT); + h_plus = (h_inode->i_nlink > 0); + } + if (inode && ibs <= bindex && bindex <= ibe) + h_cached_inode = au_h_iptr(inode, bindex); + + if (unlikely(plus != h_plus + || mode != h_mode + || h_cached_inode != h_inode)) + goto err; + continue; + + err: + err = -EINVAL; + break; + } + + return err; +} + +static int simple_reval_dpath(struct dentry *dentry, unsigned int sigen) +{ + int err; + struct dentry *parent; + struct inode *inode; + + inode = dentry->d_inode; + if (au_digen(dentry) == sigen && au_iigen(inode) == sigen) + return 0; + + parent = dget_parent(dentry); + di_read_lock_parent(parent, AuLock_IR); + AuDebugOn(au_digen(parent) != sigen + || au_iigen(parent->d_inode) != sigen); + au_dbg_verify_gen(parent, sigen); + + /* returns a number of positive dentries */ + err = au_refresh_hdentry(dentry, inode->i_mode & S_IFMT); + if (err >= 0) + err = au_refresh_hinode(inode, dentry); + + di_read_unlock(parent, AuLock_IR); + dput(parent); + return err; +} + +int au_reval_dpath(struct dentry *dentry, unsigned int sigen) +{ + int err; + struct dentry *d, *parent; + struct inode *inode; + + if (!au_ftest_si(au_sbi(dentry->d_sb), FAILED_REFRESH_DIRS)) + return simple_reval_dpath(dentry, sigen); + + /* slow loop, keep it simple and stupid */ + /* cf: au_cpup_dirs() */ + err = 0; + parent = NULL; + while (au_digen(dentry) != sigen + || au_iigen(dentry->d_inode) != sigen) { + d = dentry; + while (1) { + dput(parent); + parent = dget_parent(d); + if (au_digen(parent) == sigen + && au_iigen(parent->d_inode) == sigen) + break; + d = parent; + } + + inode = d->d_inode; + if (d != dentry) + di_write_lock_child(d); + + /* someone might update our dentry while we were sleeping */ + if (au_digen(d) != sigen || au_iigen(d->d_inode) != sigen) { + di_read_lock_parent(parent, AuLock_IR); + /* returns a number of positive dentries */ + err = au_refresh_hdentry(d, inode->i_mode & S_IFMT); + if (err >= 0) + err = au_refresh_hinode(inode, d); + di_read_unlock(parent, AuLock_IR); + } + + if (d != dentry) + di_write_unlock(d); + dput(parent); + if (unlikely(err)) + break; + } + + return err; +} + +/* + * if valid returns 1, otherwise 0. + */ +static int aufs_d_revalidate(struct dentry *dentry, struct nameidata *nd) +{ + int valid, err; + unsigned int sigen; + unsigned char do_udba; + struct super_block *sb; + struct inode *inode; + + err = -EINVAL; + sb = dentry->d_sb; + inode = dentry->d_inode; + aufs_read_lock(dentry, AuLock_FLUSH | AuLock_DW); + sigen = au_sigen(sb); + if (au_digen(dentry) != sigen) { + AuDebugOn(IS_ROOT(dentry)); + if (inode) + err = au_reval_dpath(dentry, sigen); + if (unlikely(err)) + goto out_dgrade; + AuDebugOn(au_digen(dentry) != sigen); + } + if (inode && au_iigen(inode) != sigen) { + AuDebugOn(IS_ROOT(dentry)); + err = au_refresh_hinode(inode, dentry); + if (unlikely(err)) + goto out_dgrade; + AuDebugOn(au_iigen(inode) != sigen); + } + di_downgrade_lock(dentry, AuLock_IR); + + AuDebugOn(au_digen(dentry) != sigen); + AuDebugOn(inode && au_iigen(inode) != sigen); + err = -EINVAL; + do_udba = !au_opt_test(au_mntflags(sb), UDBA_NONE); + if (do_udba && inode) { + aufs_bindex_t bstart = au_ibstart(inode); + + if (bstart >= 0 + && au_test_higen(inode, au_h_iptr(inode, bstart))) + goto out; + } + + err = h_d_revalidate(dentry, inode, nd, do_udba); + if (unlikely(!err && do_udba && au_dbstart(dentry) < 0)) + /* both of real entry and whiteout found */ + err = -EIO; + goto out; + + out_dgrade: + di_downgrade_lock(dentry, AuLock_IR); + out: + au_store_oflag(nd, inode); + aufs_read_unlock(dentry, AuLock_IR); + AuTraceErr(err); + valid = !err; + if (!valid) + AuDbg("%.*s invalid\n", AuDLNPair(dentry)); + return valid; +} + +static void aufs_d_release(struct dentry *dentry) +{ + struct au_dinfo *dinfo; + aufs_bindex_t bend, bindex; + + dinfo = dentry->d_fsdata; + if (!dinfo) + return; + + /* dentry may not be revalidated */ + bindex = dinfo->di_bstart; + if (bindex >= 0) { + struct au_hdentry *p; + + bend = dinfo->di_bend; + p = dinfo->di_hdentry + bindex; + while (bindex++ <= bend) { + if (p->hd_dentry) + au_hdput(p); + p++; + } + } + kfree(dinfo->di_hdentry); + au_rwsem_destroy(&dinfo->di_rwsem); + au_cache_free_dinfo(dinfo); + au_hin_di_reinit(dentry); +} + +struct dentry_operations aufs_dop = { + .d_revalidate = aufs_d_revalidate, + .d_release = aufs_d_release +}; diff -urN ./fs/aufs/dentry.h ./fs/aufs/dentry.h --- ./fs/aufs/dentry.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dentry.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,213 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * lookup and dentry operations + */ + +#ifndef __AUFS_DENTRY_H__ +#define __AUFS_DENTRY_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include "rwsem.h" + +/* make a single member structure for future use */ +/* todo: remove this structure */ +struct au_hdentry { + struct dentry *hd_dentry; +}; + +struct au_dinfo { + atomic_t di_generation; + + struct rw_semaphore di_rwsem; + aufs_bindex_t di_bstart, di_bend, di_bwh, di_bdiropq; + struct au_hdentry *di_hdentry; +}; + +/* ---------------------------------------------------------------------- */ + +/* dentry.c */ +extern struct dentry_operations aufs_dop; +struct au_branch; +struct dentry *au_lkup_one(struct qstr *name, struct dentry *h_parent, + struct au_branch *br, struct nameidata *nd); +struct dentry *au_sio_lkup_one(struct qstr *name, struct dentry *parent, + struct au_branch *br); +int au_h_verify(struct dentry *h_dentry, unsigned int udba, struct inode *h_dir, + struct dentry *h_parent, struct au_branch *br); + +int au_lkup_dentry(struct dentry *dentry, aufs_bindex_t bstart, mode_t type, + struct nameidata *nd); +int au_lkup_neg(struct dentry *dentry, aufs_bindex_t bindex); +int au_refresh_hdentry(struct dentry *dentry, mode_t type); +int au_reval_dpath(struct dentry *dentry, unsigned int sigen); + +/* dinfo.c */ +int au_alloc_dinfo(struct dentry *dentry); +int au_di_realloc(struct au_dinfo *dinfo, int nbr); + +void di_read_lock(struct dentry *d, int flags, unsigned int lsc); +void di_read_unlock(struct dentry *d, int flags); +void di_downgrade_lock(struct dentry *d, int flags); +void di_write_lock(struct dentry *d, unsigned int lsc); +void di_write_unlock(struct dentry *d); +void di_write_lock2_child(struct dentry *d1, struct dentry *d2, int isdir); +void di_write_lock2_parent(struct dentry *d1, struct dentry *d2, int isdir); +void di_write_unlock2(struct dentry *d1, struct dentry *d2); + +struct dentry *au_h_dptr(struct dentry *dentry, aufs_bindex_t bindex); +aufs_bindex_t au_dbtail(struct dentry *dentry); +aufs_bindex_t au_dbtaildir(struct dentry *dentry); + +void au_set_h_dptr(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_dentry); +void au_update_digen(struct dentry *dentry); +void au_update_dbrange(struct dentry *dentry, int do_put_zero); +void au_update_dbstart(struct dentry *dentry); +void au_update_dbend(struct dentry *dentry); +int au_find_dbindex(struct dentry *dentry, struct dentry *h_dentry); + +/* ---------------------------------------------------------------------- */ + +static inline struct au_dinfo *au_di(struct dentry *dentry) +{ + return dentry->d_fsdata; +} + +/* ---------------------------------------------------------------------- */ + +/* lock subclass for dinfo */ +enum { + AuLsc_DI_CHILD, /* child first */ + AuLsc_DI_CHILD2, /* rename(2), link(2), and cpup at hinotify */ + AuLsc_DI_CHILD3, /* copyup dirs */ + AuLsc_DI_PARENT, + AuLsc_DI_PARENT2, + AuLsc_DI_PARENT3 +}; + +/* + * di_read_lock_child, di_write_lock_child, + * di_read_lock_child2, di_write_lock_child2, + * di_read_lock_child3, di_write_lock_child3, + * di_read_lock_parent, di_write_lock_parent, + * di_read_lock_parent2, di_write_lock_parent2, + * di_read_lock_parent3, di_write_lock_parent3, + */ +#define AuReadLockFunc(name, lsc) \ +static inline void di_read_lock_##name(struct dentry *d, int flags) \ +{ di_read_lock(d, flags, AuLsc_DI_##lsc); } + +#define AuWriteLockFunc(name, lsc) \ +static inline void di_write_lock_##name(struct dentry *d) \ +{ di_write_lock(d, AuLsc_DI_##lsc); } + +#define AuRWLockFuncs(name, lsc) \ + AuReadLockFunc(name, lsc) \ + AuWriteLockFunc(name, lsc) + +AuRWLockFuncs(child, CHILD); +AuRWLockFuncs(child2, CHILD2); +AuRWLockFuncs(child3, CHILD3); +AuRWLockFuncs(parent, PARENT); +AuRWLockFuncs(parent2, PARENT2); +AuRWLockFuncs(parent3, PARENT3); + +#undef AuReadLockFunc +#undef AuWriteLockFunc +#undef AuRWLockFuncs + +#define DiMustNoWaiters(d) AuRwMustNoWaiters(&au_di(d)->di_rwsem) + +/* ---------------------------------------------------------------------- */ + +/* todo: memory barrier? */ +static inline unsigned int au_digen(struct dentry *d) +{ + return atomic_read(&au_di(d)->di_generation); +} + +static inline void au_h_dentry_init(struct au_hdentry *hdentry) +{ + hdentry->hd_dentry = NULL; +} + +static inline void au_hdput(struct au_hdentry *hd) +{ + dput(hd->hd_dentry); +} + +static inline aufs_bindex_t au_dbstart(struct dentry *dentry) +{ + return au_di(dentry)->di_bstart; +} + +static inline aufs_bindex_t au_dbend(struct dentry *dentry) +{ + return au_di(dentry)->di_bend; +} + +static inline aufs_bindex_t au_dbwh(struct dentry *dentry) +{ + return au_di(dentry)->di_bwh; +} + +static inline aufs_bindex_t au_dbdiropq(struct dentry *dentry) +{ + return au_di(dentry)->di_bdiropq; +} + +/* todo: hard/soft set? */ +static inline void au_set_dbstart(struct dentry *dentry, aufs_bindex_t bindex) +{ + au_di(dentry)->di_bstart = bindex; +} + +static inline void au_set_dbend(struct dentry *dentry, aufs_bindex_t bindex) +{ + au_di(dentry)->di_bend = bindex; +} + +static inline void au_set_dbwh(struct dentry *dentry, aufs_bindex_t bindex) +{ + /* dbwh can be outside of bstart - bend range */ + au_di(dentry)->di_bwh = bindex; +} + +static inline void au_set_dbdiropq(struct dentry *dentry, aufs_bindex_t bindex) +{ + au_di(dentry)->di_bdiropq = bindex; +} + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_HINOTIFY +static inline void au_digen_dec(struct dentry *d) +{ + atomic_dec(&au_di(d)->di_generation); +} + +static inline void au_hin_di_reinit(struct dentry *dentry) +{ + dentry->d_fsdata = NULL; +} +#else +static inline void au_hin_di_reinit(struct dentry *dentry __maybe_unused) +{ + /* empty */ +} +#endif /* CONFIG_AUFS_HINOTIFY */ + +#endif /* __KERNEL__ */ +#endif /* __AUFS_DENTRY_H__ */ diff -urN ./fs/aufs/dinfo.c ./fs/aufs/dinfo.c --- ./fs/aufs/dinfo.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dinfo.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,351 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * dentry private data + */ + +#include "aufs.h" + +int au_alloc_dinfo(struct dentry *dentry) +{ + struct au_dinfo *dinfo; + struct super_block *sb; + int nbr; + + dinfo = au_cache_alloc_dinfo(); + if (unlikely(!dinfo)) + goto out; + + sb = dentry->d_sb; + nbr = au_sbend(sb) + 1; + if (nbr <= 0) + nbr = 1; + dinfo->di_hdentry = kcalloc(nbr, sizeof(*dinfo->di_hdentry), GFP_NOFS); + if (unlikely(!dinfo->di_hdentry)) + goto out_dinfo; + + atomic_set(&dinfo->di_generation, au_sigen(sb)); + /* smp_mb(); */ /* atomic_set */ + init_rwsem(&dinfo->di_rwsem); + down_write_nested(&dinfo->di_rwsem, AuLsc_DI_CHILD); + dinfo->di_bstart = -1; + dinfo->di_bend = -1; + dinfo->di_bwh = -1; + dinfo->di_bdiropq = -1; + + dentry->d_fsdata = dinfo; + dentry->d_op = &aufs_dop; + return 0; /* success */ + + out_dinfo: + au_cache_free_dinfo(dinfo); + out: + return -ENOMEM; +} + +int au_di_realloc(struct au_dinfo *dinfo, int nbr) +{ + int err, sz; + struct au_hdentry *hdp; + + err = -ENOMEM; + sz = sizeof(*hdp) * (dinfo->di_bend + 1); + if (!sz) + sz = sizeof(*hdp); + hdp = au_kzrealloc(dinfo->di_hdentry, sz, sizeof(*hdp) * nbr, GFP_NOFS); + if (hdp) { + dinfo->di_hdentry = hdp; + err = 0; + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +static void do_ii_write_lock(struct inode *inode, unsigned int lsc) +{ + switch (lsc) { + case AuLsc_DI_CHILD: + ii_write_lock_child(inode); + break; + case AuLsc_DI_CHILD2: + ii_write_lock_child2(inode); + break; + case AuLsc_DI_CHILD3: + ii_write_lock_child3(inode); + break; + case AuLsc_DI_PARENT: + ii_write_lock_parent(inode); + break; + case AuLsc_DI_PARENT2: + ii_write_lock_parent2(inode); + break; + case AuLsc_DI_PARENT3: + ii_write_lock_parent3(inode); + break; + default: + BUG(); + } +} + +static void do_ii_read_lock(struct inode *inode, unsigned int lsc) +{ + switch (lsc) { + case AuLsc_DI_CHILD: + ii_read_lock_child(inode); + break; + case AuLsc_DI_CHILD2: + ii_read_lock_child2(inode); + break; + case AuLsc_DI_CHILD3: + ii_read_lock_child3(inode); + break; + case AuLsc_DI_PARENT: + ii_read_lock_parent(inode); + break; + case AuLsc_DI_PARENT2: + ii_read_lock_parent2(inode); + break; + case AuLsc_DI_PARENT3: + ii_read_lock_parent3(inode); + break; + default: + BUG(); + } +} + +void di_read_lock(struct dentry *d, int flags, unsigned int lsc) +{ + down_read_nested(&au_di(d)->di_rwsem, lsc); + if (d->d_inode) { + if (au_ftest_lock(flags, IW)) + do_ii_write_lock(d->d_inode, lsc); + else if (au_ftest_lock(flags, IR)) + do_ii_read_lock(d->d_inode, lsc); + } +} + +void di_read_unlock(struct dentry *d, int flags) +{ + if (d->d_inode) { + if (au_ftest_lock(flags, IW)) + ii_write_unlock(d->d_inode); + else if (au_ftest_lock(flags, IR)) + ii_read_unlock(d->d_inode); + } + up_read(&au_di(d)->di_rwsem); +} + +void di_downgrade_lock(struct dentry *d, int flags) +{ + downgrade_write(&au_di(d)->di_rwsem); + if (d->d_inode && au_ftest_lock(flags, IR)) + ii_downgrade_lock(d->d_inode); +} + +void di_write_lock(struct dentry *d, unsigned int lsc) +{ + down_write_nested(&au_di(d)->di_rwsem, lsc); + if (d->d_inode) + do_ii_write_lock(d->d_inode, lsc); +} + +void di_write_unlock(struct dentry *d) +{ + if (d->d_inode) + ii_write_unlock(d->d_inode); + up_write(&au_di(d)->di_rwsem); +} + +void di_write_lock2_child(struct dentry *d1, struct dentry *d2, int isdir) +{ + AuDebugOn(d1 == d2 + || d1->d_inode == d2->d_inode + || d1->d_sb != d2->d_sb); + + if (isdir && au_test_subdir(d1, d2)) { + di_write_lock_child(d1); + di_write_lock_child2(d2); + } else { + /* there should be no races */ + di_write_lock_child(d2); + di_write_lock_child2(d1); + } +} + +void di_write_lock2_parent(struct dentry *d1, struct dentry *d2, int isdir) +{ + AuDebugOn(d1 == d2 + || d1->d_inode == d2->d_inode + || d1->d_sb != d2->d_sb); + + if (isdir && au_test_subdir(d1, d2)) { + di_write_lock_parent(d1); + di_write_lock_parent2(d2); + } else { + /* there should be no races */ + di_write_lock_parent(d2); + di_write_lock_parent2(d1); + } +} + +void di_write_unlock2(struct dentry *d1, struct dentry *d2) +{ + di_write_unlock(d1); + if (d1->d_inode == d2->d_inode) + up_write(&au_di(d2)->di_rwsem); + else + di_write_unlock(d2); +} + +/* ---------------------------------------------------------------------- */ + +struct dentry *au_h_dptr(struct dentry *dentry, aufs_bindex_t bindex) +{ + struct dentry *d; + + if (au_dbstart(dentry) < 0 || bindex < au_dbstart(dentry)) + return NULL; + AuDebugOn(bindex < 0); + d = au_di(dentry)->di_hdentry[0 + bindex].hd_dentry; + AuDebugOn(d && (atomic_read(&d->d_count) <= 0)); + return d; +} + +aufs_bindex_t au_dbtail(struct dentry *dentry) +{ + aufs_bindex_t bend, bwh; + + bend = au_dbend(dentry); + if (0 <= bend) { + bwh = au_dbwh(dentry); + if (!bwh) + return bwh; + if (0 < bwh && bwh < bend) + return bwh - 1; + } + return bend; +} + +aufs_bindex_t au_dbtaildir(struct dentry *dentry) +{ + aufs_bindex_t bend, bopq; + + bend = au_dbtail(dentry); + if (0 <= bend) { + bopq = au_dbdiropq(dentry); + if (0 <= bopq && bopq < bend) + bend = bopq; + } + return bend; +} + +/* ---------------------------------------------------------------------- */ + +void au_set_h_dptr(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_dentry) +{ + struct au_hdentry *hd = au_di(dentry)->di_hdentry + bindex; + + if (hd->hd_dentry) + au_hdput(hd); + hd->hd_dentry = h_dentry; +} + +void au_update_digen(struct dentry *dentry) +{ + atomic_set(&au_di(dentry)->di_generation, au_sigen(dentry->d_sb)); + /* smp_mb(); */ /* atomic_set */ +} + +void au_update_dbrange(struct dentry *dentry, int do_put_zero) +{ + struct au_dinfo *dinfo; + struct dentry *h_d; + + dinfo = au_di(dentry); + if (!dinfo || dinfo->di_bstart < 0) + return; + + if (do_put_zero) { + aufs_bindex_t bindex, bend; + + bend = dinfo->di_bend; + for (bindex = dinfo->di_bstart; bindex <= bend; bindex++) { + h_d = dinfo->di_hdentry[0 + bindex].hd_dentry; + if (h_d && !h_d->d_inode) + au_set_h_dptr(dentry, bindex, NULL); + } + } + + dinfo->di_bstart = -1; + while (++dinfo->di_bstart <= dinfo->di_bend) + if (dinfo->di_hdentry[0 + dinfo->di_bstart].hd_dentry) + break; + if (dinfo->di_bstart > dinfo->di_bend) { + dinfo->di_bstart = -1; + dinfo->di_bend = -1; + return; + } + + dinfo->di_bend++; + while (0 <= --dinfo->di_bend) + if (dinfo->di_hdentry[0 + dinfo->di_bend].hd_dentry) + break; + AuDebugOn(dinfo->di_bstart > dinfo->di_bend || dinfo->di_bend < 0); +} + +void au_update_dbstart(struct dentry *dentry) +{ + aufs_bindex_t bindex, bend; + struct dentry *h_dentry; + + bend = au_dbend(dentry); + for (bindex = au_dbstart(dentry); bindex <= bend; bindex++) { + h_dentry = au_h_dptr(dentry, bindex); + if (!h_dentry) + continue; + if (h_dentry->d_inode) { + au_set_dbstart(dentry, bindex); + return; + } + au_set_h_dptr(dentry, bindex, NULL); + } +} + +void au_update_dbend(struct dentry *dentry) +{ + aufs_bindex_t bindex, bstart; + struct dentry *h_dentry; + + bstart = au_dbstart(dentry); + for (bindex = au_dbend(dentry); bindex <= bstart; bindex--) { + h_dentry = au_h_dptr(dentry, bindex); + if (!h_dentry) + continue; + if (h_dentry->d_inode) { + au_set_dbend(dentry, bindex); + return; + } + au_set_h_dptr(dentry, bindex, NULL); + } +} + +int au_find_dbindex(struct dentry *dentry, struct dentry *h_dentry) +{ + aufs_bindex_t bindex, bend; + + bend = au_dbend(dentry); + for (bindex = au_dbstart(dentry); bindex <= bend; bindex++) + if (au_h_dptr(dentry, bindex) == h_dentry) + return bindex; + return -1; +} diff -urN ./fs/aufs/dir.c ./fs/aufs/dir.c --- ./fs/aufs/dir.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dir.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,513 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * directory operations + */ + +#include +#include "aufs.h" + +void au_add_nlink(struct inode *dir, struct inode *h_dir) +{ + AuDebugOn(!S_ISDIR(dir->i_mode) || !S_ISDIR(h_dir->i_mode)); + + dir->i_nlink += h_dir->i_nlink - 2; + if (h_dir->i_nlink < 2) + dir->i_nlink += 2; +} + +void au_sub_nlink(struct inode *dir, struct inode *h_dir) +{ + AuDebugOn(!S_ISDIR(dir->i_mode) || !S_ISDIR(h_dir->i_mode)); + + dir->i_nlink -= h_dir->i_nlink - 2; + if (h_dir->i_nlink < 2) + dir->i_nlink -= 2; +} + +/* ---------------------------------------------------------------------- */ + +static int reopen_dir(struct file *file) +{ + int err; + unsigned int flags; + aufs_bindex_t bindex, btail, bstart; + struct dentry *dentry, *h_dentry; + struct file *h_file; + + /* open all lower dirs */ + dentry = file->f_dentry; + bstart = au_dbstart(dentry); + for (bindex = au_fbstart(file); bindex < bstart; bindex++) + au_set_h_fptr(file, bindex, NULL); + au_set_fbstart(file, bstart); + + btail = au_dbtaildir(dentry); + for (bindex = au_fbend(file); btail < bindex; bindex--) + au_set_h_fptr(file, bindex, NULL); + au_set_fbend(file, btail); + + flags = file->f_flags; + for (bindex = bstart; bindex <= btail; bindex++) { + h_dentry = au_h_dptr(dentry, bindex); + if (!h_dentry) + continue; + h_file = au_h_fptr(file, bindex); + if (h_file) + continue; + + h_file = au_h_open(dentry, bindex, flags, file); + err = PTR_ERR(h_file); + if (IS_ERR(h_file)) + goto out; /* close all? */ + au_set_h_fptr(file, bindex, h_file); + } + au_update_figen(file); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + err = 0; + + out: + return err; +} + +static int do_open_dir(struct file *file, int flags) +{ + int err; + aufs_bindex_t bindex, btail; + struct dentry *dentry, *h_dentry; + struct file *h_file; + + err = 0; + dentry = file->f_dentry; + au_set_fvdir_cache(file, NULL); + au_fi(file)->fi_maintain_plink = 0; + file->f_version = dentry->d_inode->i_version; + bindex = au_dbstart(dentry); + au_set_fbstart(file, bindex); + btail = au_dbtaildir(dentry); + au_set_fbend(file, btail); + for (; !err && bindex <= btail; bindex++) { + h_dentry = au_h_dptr(dentry, bindex); + if (!h_dentry) + continue; + + h_file = au_h_open(dentry, bindex, flags, file); + if (IS_ERR(h_file)) { + err = PTR_ERR(h_file); + break; + } + au_set_h_fptr(file, bindex, h_file); + } + au_update_figen(file); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + if (!err) + return 0; /* success */ + + /* close all */ + for (bindex = au_fbstart(file); bindex <= btail; bindex++) + au_set_h_fptr(file, bindex, NULL); + au_set_fbstart(file, -1); + au_set_fbend(file, -1); + return err; +} + +static int aufs_open_dir(struct inode *inode __maybe_unused, + struct file *file) +{ + return au_do_open(file, do_open_dir); +} + +static int aufs_release_dir(struct inode *inode __maybe_unused, + struct file *file) +{ + struct au_vdir *vdir_cache; + struct super_block *sb; + struct au_sbinfo *sbinfo; + + sb = file->f_dentry->d_sb; + si_noflush_read_lock(sb); + fi_write_lock(file); + vdir_cache = au_fvdir_cache(file); + if (vdir_cache) + au_vdir_free(vdir_cache); + if (au_fi(file)->fi_maintain_plink) { + sbinfo = au_sbi(sb); + au_fclr_si(sbinfo, MAINTAIN_PLINK); + wake_up_all(&sbinfo->si_plink_wq); + } + fi_write_unlock(file); + au_finfo_fin(file); + si_read_unlock(sb); + return 0; +} + +/* ---------------------------------------------------------------------- */ + +static int au_do_fsync_dir_no_file(struct dentry *dentry, int datasync) +{ + int err; + aufs_bindex_t bend, bindex; + struct inode *inode; + struct super_block *sb; + + err = 0; + sb = dentry->d_sb; + inode = dentry->d_inode; + IMustLock(inode); + bend = au_dbend(dentry); + for (bindex = au_dbstart(dentry); !err && bindex <= bend; bindex++) { + struct path h_path; + struct inode *h_inode; + + if (au_test_ro(sb, bindex, inode)) + continue; + h_path.dentry = au_h_dptr(dentry, bindex); + if (!h_path.dentry) + continue; + h_inode = h_path.dentry->d_inode; + if (!h_inode) + continue; + + /* no mnt_want_write() */ + /* cf. fs/nsfd/vfs.c and fs/nfsd/nfs4recover.c */ + /* todo: inotiry fired? */ + h_path.mnt = au_sbr_mnt(sb, bindex); + mutex_lock(&h_inode->i_mutex); + err = filemap_fdatawrite(h_inode->i_mapping); + AuDebugOn(!h_inode->i_fop); + if (!err && h_inode->i_fop->fsync) + err = h_inode->i_fop->fsync(NULL, h_path.dentry, + datasync); + if (!err) + err = filemap_fdatawrite(h_inode->i_mapping); + if (!err) + vfsub_update_h_iattr(&h_path, /*did*/NULL); /*ignore*/ + mutex_unlock(&h_inode->i_mutex); + } + + return err; +} + +static int au_do_fsync_dir(struct file *file, int datasync) +{ + int err; + aufs_bindex_t bend, bindex; + struct file *h_file; + struct super_block *sb; + struct inode *inode; + struct mutex *h_mtx; + + err = au_reval_and_lock_fdi(file, reopen_dir, /*wlock*/1); + if (unlikely(err)) + goto out; + + sb = file->f_dentry->d_sb; + inode = file->f_dentry->d_inode; + bend = au_fbend(file); + for (bindex = au_fbstart(file); !err && bindex <= bend; bindex++) { + h_file = au_h_fptr(file, bindex); + if (!h_file || au_test_ro(sb, bindex, inode)) + continue; + + err = vfs_fsync(h_file, h_file->f_dentry, datasync); + if (!err) { + h_mtx = &h_file->f_dentry->d_inode->i_mutex; + mutex_lock(h_mtx); + vfsub_update_h_iattr(&h_file->f_path, /*did*/NULL); + /*ignore*/ + mutex_unlock(h_mtx); + } + } + + out: + return err; +} + +/* + * @file may be NULL + */ +static int aufs_fsync_dir(struct file *file, struct dentry *dentry, + int datasync) +{ + int err; + struct super_block *sb; + + IMustLock(dentry->d_inode); + + err = 0; + sb = dentry->d_sb; + si_noflush_read_lock(sb); + if (file) + err = au_do_fsync_dir(file, datasync); + else { + di_write_lock_child(dentry); + err = au_do_fsync_dir_no_file(dentry, datasync); + } + au_cpup_attr_timesizes(dentry->d_inode); + di_write_unlock(dentry); + if (file) + fi_write_unlock(file); + + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int aufs_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + int err; + struct dentry *dentry; + struct inode *inode; + struct super_block *sb; + + dentry = file->f_dentry; + inode = dentry->d_inode; + IMustLock(inode); + + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, reopen_dir, /*wlock*/1); + if (unlikely(err)) + goto out; + err = au_vdir_init(file); + di_downgrade_lock(dentry, AuLock_IR); + if (unlikely(err)) + goto out_unlock; + + if (!au_test_nfsd(current)) { + err = au_vdir_fill_de(file, dirent, filldir); + fsstack_copy_attr_atime(inode, + au_h_iptr(inode, au_ibstart(inode))); + } else { + /* + * nfsd filldir may call lookup_one_len(), vfs_getattr(), + * encode_fh() and others. + */ + struct inode *h_inode = au_h_iptr(inode, au_ibstart(inode)); + + di_read_unlock(dentry, AuLock_IR); + si_read_unlock(sb); + lockdep_off(); + err = au_vdir_fill_de(file, dirent, filldir); + lockdep_on(); + fsstack_copy_attr_atime(inode, h_inode); + fi_write_unlock(file); + + AuTraceErr(err); + return err; + } + + out_unlock: + di_read_unlock(dentry, AuLock_IR); + fi_write_unlock(file); + out: + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +#define AuTestEmpty_WHONLY 1 +#define AuTestEmpty_CALLED (1 << 2) +#define au_ftest_testempty(flags, name) ((flags) & AuTestEmpty_##name) +#define au_fset_testempty(flags, name) { (flags) |= AuTestEmpty_##name; } +#define au_fclr_testempty(flags, name) { (flags) &= ~AuTestEmpty_##name; } + +struct test_empty_arg { + struct au_nhash *whlist; + unsigned int flags; + int err; + aufs_bindex_t bindex; +}; + +static int test_empty_cb(void *__arg, const char *__name, int namelen, + loff_t offset __maybe_unused, u64 ino __maybe_unused, + unsigned int d_type __maybe_unused) +{ + struct test_empty_arg *arg = __arg; + char *name = (void *)__name; + + arg->err = 0; + au_fset_testempty(arg->flags, CALLED); + /* smp_mb(); */ + if (name[0] == '.' + && (namelen == 1 || (name[1] == '.' && namelen == 2))) + goto out; /* success */ + + if (namelen <= AUFS_WH_PFX_LEN + || memcmp(name, AUFS_WH_PFX, AUFS_WH_PFX_LEN)) { + if (au_ftest_testempty(arg->flags, WHONLY) + && !au_nhash_test_known_wh(arg->whlist, name, namelen)) + arg->err = -ENOTEMPTY; + goto out; + } + + name += AUFS_WH_PFX_LEN; + namelen -= AUFS_WH_PFX_LEN; + if (!au_nhash_test_known_wh(arg->whlist, name, namelen)) + arg->err = au_nhash_append_wh + (arg->whlist, name, namelen, arg->bindex); + + out: + /* smp_mb(); */ + AuTraceErr(arg->err); + return arg->err; +} + +static int do_test_empty(struct dentry *dentry, struct test_empty_arg *arg) +{ + int err; + struct file *h_file; + + h_file = au_h_open(dentry, arg->bindex, + O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_LARGEFILE, + /*file*/NULL); + err = PTR_ERR(h_file); + if (IS_ERR(h_file)) + goto out; + + err = 0; + if (!au_opt_test(au_mntflags(dentry->d_sb), UDBA_NONE) + && !h_file->f_dentry->d_inode->i_nlink) + goto out_put; + + do { + arg->err = 0; + au_fclr_testempty(arg->flags, CALLED); + /* smp_mb(); */ + err = vfsub_readdir(h_file, test_empty_cb, arg); + if (err >= 0) + err = arg->err; + } while (!err && au_ftest_testempty(arg->flags, CALLED)); + + out_put: + fput(h_file); + au_sbr_put(dentry->d_sb, arg->bindex); + out: + return err; +} + +struct do_test_empty_args { + int *errp; + struct dentry *dentry; + struct test_empty_arg *arg; +}; + +static void call_do_test_empty(void *args) +{ + struct do_test_empty_args *a = args; + *a->errp = do_test_empty(a->dentry, a->arg); +} + +static int sio_test_empty(struct dentry *dentry, struct test_empty_arg *arg) +{ + int err, wkq_err; + struct dentry *h_dentry; + struct inode *h_inode; + + h_dentry = au_h_dptr(dentry, arg->bindex); + h_inode = h_dentry->d_inode; + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + err = au_test_h_perm_sio(h_inode, MAY_EXEC | MAY_READ); + mutex_unlock(&h_inode->i_mutex); + if (!err) + err = do_test_empty(dentry, arg); + else { + struct do_test_empty_args args = { + .errp = &err, + .dentry = dentry, + .arg = arg + }; + unsigned int flags = arg->flags; + + wkq_err = au_wkq_wait(call_do_test_empty, &args); + if (unlikely(wkq_err)) + err = wkq_err; + arg->flags = flags; + } + + return err; +} + +int au_test_empty_lower(struct dentry *dentry) +{ + int err; + aufs_bindex_t bindex, bstart, btail; + struct test_empty_arg arg; + struct au_nhash *whlist; + + whlist = au_nhash_new(GFP_NOFS); + err = PTR_ERR(whlist); + if (IS_ERR(whlist)) + goto out; + + bstart = au_dbstart(dentry); + arg.whlist = whlist; + arg.flags = 0; + arg.bindex = bstart; + err = do_test_empty(dentry, &arg); + if (unlikely(err)) + goto out_whlist; + + au_fset_testempty(arg.flags, WHONLY); + btail = au_dbtaildir(dentry); + for (bindex = bstart + 1; !err && bindex <= btail; bindex++) { + struct dentry *h_dentry; + + h_dentry = au_h_dptr(dentry, bindex); + if (h_dentry && h_dentry->d_inode) { + arg.bindex = bindex; + err = do_test_empty(dentry, &arg); + } + } + + out_whlist: + au_nhash_del(whlist); + out: + return err; +} + +int au_test_empty(struct dentry *dentry, struct au_nhash *whlist) +{ + int err; + struct test_empty_arg arg; + aufs_bindex_t bindex, btail; + + err = 0; + arg.whlist = whlist; + arg.flags = AuTestEmpty_WHONLY; + btail = au_dbtaildir(dentry); + for (bindex = au_dbstart(dentry); !err && bindex <= btail; bindex++) { + struct dentry *h_dentry; + + h_dentry = au_h_dptr(dentry, bindex); + if (h_dentry && h_dentry->d_inode) { + arg.bindex = bindex; + err = sio_test_empty(dentry, &arg); + } + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +const struct file_operations aufs_dir_fop = { + .read = generic_read_dir, + .readdir = aufs_readdir, + .unlocked_ioctl = aufs_ioctl_dir, + .open = aufs_open_dir, + .release = aufs_release_dir, + .flush = aufs_flush, + .fsync = aufs_fsync_dir +}; diff -urN ./fs/aufs/dir.h ./fs/aufs/dir.h --- ./fs/aufs/dir.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/dir.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,104 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * directory operations + */ + +#ifndef __AUFS_DIR_H__ +#define __AUFS_DIR_H__ + +#ifdef __KERNEL__ + +#include +#include + +/* ---------------------------------------------------------------------- */ + +/* need to be faster and smaller */ + +#define AuSize_DEBLK 512 +#define AuSize_NHASH 32 + +typedef char au_vdir_deblk_t[AuSize_DEBLK]; + +struct au_nhash { + struct hlist_head heads[AuSize_NHASH]; +}; + +struct au_vdir_destr { + unsigned char len; + char name[0]; +} __packed; + +struct au_vdir_dehstr { + struct hlist_node hash; + struct au_vdir_destr *str; +}; + +struct au_vdir_de { + ino_t de_ino; + unsigned char de_type; + /* caution: packed */ + struct au_vdir_destr de_str; +} __packed; + +struct au_vdir_wh { + struct hlist_node wh_hash; + aufs_bindex_t wh_bindex; + struct au_vdir_destr wh_str; +} __packed; + +union au_vdir_deblk_p { + unsigned char *p; + au_vdir_deblk_t *deblk; + struct au_vdir_de *de; +}; + +struct au_vdir { + au_vdir_deblk_t **vd_deblk; + int vd_nblk; + struct { + int i; + union au_vdir_deblk_p p; + } vd_last; + + unsigned long vd_version; + unsigned long vd_jiffy; +}; + +/* ---------------------------------------------------------------------- */ + +/* dir.c */ +extern const struct file_operations aufs_dir_fop; +void au_add_nlink(struct inode *dir, struct inode *h_dir); +void au_sub_nlink(struct inode *dir, struct inode *h_dir); +int au_test_empty_lower(struct dentry *dentry); +int au_test_empty(struct dentry *dentry, struct au_nhash *whlist); + +/* vdir.c */ +struct au_nhash *au_nhash_new(gfp_t gfp); +void au_nhash_del(struct au_nhash *nhash); +void au_nhash_init(struct au_nhash *nhash); +void au_nhash_move(struct au_nhash *dst, struct au_nhash *src); +void au_nhash_fin(struct au_nhash *nhash); +int au_nhash_test_longer_wh(struct au_nhash *whlist, aufs_bindex_t btgt, + int limit); +int au_nhash_test_known_wh(struct au_nhash *whlist, char *name, int namelen); +int au_nhash_append_wh(struct au_nhash *whlist, char *name, int namelen, + aufs_bindex_t bindex); +void au_vdir_free(struct au_vdir *vdir); +int au_vdir_init(struct file *file); +int au_vdir_fill_de(struct file *file, void *dirent, filldir_t filldir); + +/* ioctl.c */ +long aufs_ioctl_dir(struct file *file, unsigned int cmd, unsigned long arg); + +#endif /* __KERNEL__ */ +#endif /* __AUFS_DIR_H__ */ diff -urN ./fs/aufs/export.c ./fs/aufs/export.c --- ./fs/aufs/export.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/export.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,725 @@ +/* + * Copyright (C) 2005-2009 Junjiro Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * export via nfs + */ + +#include +#include +#include +#include "aufs.h" + +union conv { +#ifdef CONFIG_AUFS_INO_T_64 + __u32 a[2]; +#else + __u32 a[1]; +#endif + ino_t ino; +}; + +static ino_t decode_ino(__u32 *a) +{ + union conv u; + + BUILD_BUG_ON(sizeof(u.ino) != sizeof(u.a)); + u.a[0] = a[0]; +#ifdef CONFIG_AUFS_INO_T_64 + u.a[1] = a[1]; +#endif + return u.ino; +} + +static void encode_ino(__u32 *a, ino_t ino) +{ + union conv u; + + u.ino = ino; + a[0] = u.a[0]; +#ifdef CONFIG_AUFS_INO_T_64 + a[1] = u.a[1]; +#endif +} + +/* NFS file handle */ +enum { + Fh_br_id, + Fh_sigen, +#ifdef CONFIG_AUFS_INO_T_64 + /* support 64bit inode number */ + Fh_ino1, + Fh_ino2, + Fh_dir_ino1, + Fh_dir_ino2, +#else + Fh_ino1, + Fh_dir_ino1, +#endif + Fh_igen, + Fh_h_type, + Fh_tail, + + Fh_ino = Fh_ino1, + Fh_dir_ino = Fh_dir_ino1 +}; + +static int au_test_anon(struct dentry *dentry) +{ + return !!(dentry->d_flags & DCACHE_DISCONNECTED); +} + +/* ---------------------------------------------------------------------- */ +/* inode generation external table */ + +int au_xigen_inc(struct inode *inode) +{ + int err; + loff_t pos; + ssize_t sz; + __u32 igen; + struct super_block *sb; + struct au_sbinfo *sbinfo; + + err = 0; + sb = inode->i_sb; + if (unlikely(!au_opt_test(au_mntflags(sb), XINO))) + goto out; + + pos = inode->i_ino; + pos *= sizeof(igen); + igen = inode->i_generation + 1; + sbinfo = au_sbi(sb); + sz = xino_fwrite(sbinfo->si_xwrite, sbinfo->si_xigen, &igen, + sizeof(igen), &pos); + if (sz == sizeof(igen)) + goto out; /* success */ + + err = sz; + if (unlikely(sz >= 0)) { + err = -EIO; + AuIOErr("xigen error (%zd)\n", sz); + } + + out: + return err; +} + +int au_xigen_new(struct inode *inode) +{ + int err; + loff_t pos; + ssize_t sz; + struct super_block *sb; + struct au_sbinfo *sbinfo; + struct file *file; + + err = 0; + /* todo: dirty, at mount time */ + if (inode->i_ino == AUFS_ROOT_INO) + goto out; + sb = inode->i_sb; + if (unlikely(!au_opt_test(au_mntflags(sb), XINO))) + goto out; + + err = -EFBIG; + pos = inode->i_ino; + if (unlikely(au_loff_max / sizeof(inode->i_generation) - 1 < pos)) { + AuIOErr1("too large i%lld\n", pos); + goto out; + } + pos *= sizeof(inode->i_generation); + + err = 0; + sbinfo = au_sbi(sb); + file = sbinfo->si_xigen; + BUG_ON(!file); + + if (i_size_read(file->f_dentry->d_inode) + < pos + sizeof(inode->i_generation)) { + inode->i_generation = atomic_inc_return(&sbinfo->si_xigen_next); + sz = xino_fwrite(sbinfo->si_xwrite, file, &inode->i_generation, + sizeof(inode->i_generation), &pos); + } else + sz = xino_fread(sbinfo->si_xread, file, &inode->i_generation, + sizeof(inode->i_generation), &pos); + if (sz == sizeof(inode->i_generation)) + goto out; /* success */ + + err = sz; + if (unlikely(sz >= 0)) { + err = -EIO; + AuIOErr("xigen error (%zd)\n", sz); + } + + out: + return err; +} + +int au_xigen_set(struct super_block *sb, struct file *base) +{ + int err; + struct au_sbinfo *sbinfo; + struct file *file; + + sbinfo = au_sbi(sb); + file = au_xino_create2(base, sbinfo->si_xigen); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + err = 0; + if (sbinfo->si_xigen) + fput(sbinfo->si_xigen); + sbinfo->si_xigen = file; + + out: + return err; +} + +void au_xigen_clr(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + + sbinfo = au_sbi(sb); + if (sbinfo->si_xigen) { + fput(sbinfo->si_xigen); + sbinfo->si_xigen = NULL; + } +} + +/* ---------------------------------------------------------------------- */ + +static struct dentry *decode_by_ino(struct super_block *sb, ino_t ino, + ino_t dir_ino) +{ + struct dentry *dentry, *d; + struct inode *inode; + unsigned int sigen; + + dentry = NULL; + inode = ilookup(sb, ino); + if (!inode) + goto out; + + dentry = ERR_PTR(-ESTALE); + sigen = au_sigen(sb); + if (unlikely(is_bad_inode(inode) + || IS_DEADDIR(inode) + || sigen != au_iigen(inode))) + goto out_iput; + + dentry = NULL; + if (!dir_ino || S_ISDIR(inode->i_mode)) + dentry = d_find_alias(inode); + else { + spin_lock(&dcache_lock); + list_for_each_entry(d, &inode->i_dentry, d_alias) + if (!au_test_anon(d) + && d->d_parent->d_inode->i_ino == dir_ino) { + dentry = dget_locked(d); + break; + } + spin_unlock(&dcache_lock); + } + if (unlikely(dentry && sigen != au_digen(dentry))) { + dput(dentry); + dentry = ERR_PTR(-ESTALE); + } + + out_iput: + iput(inode); + out: + return dentry; +} + +/* ---------------------------------------------------------------------- */ + +/* todo: dirty? */ +/* if exportfs_decode_fh() passed vfsmount*, we could be happy */ +static struct vfsmount *au_mnt_get(struct super_block *sb) +{ + struct mnt_namespace *ns; + struct vfsmount *pos, *mnt; + + spin_lock(&vfsmount_lock); + /* no get/put ?? */ + AuDebugOn(!current->nsproxy); + ns = current->nsproxy->mnt_ns; + AuDebugOn(!ns); + mnt = NULL; + /* the order (reverse) will not be a problem */ + list_for_each_entry(pos, &ns->list, mnt_list) + if (pos->mnt_sb == sb) { + mnt = mntget(pos); + break; + } + spin_unlock(&vfsmount_lock); + AuDebugOn(!mnt); + + return mnt; +} + +struct au_nfsd_si_lock { + const unsigned int sigen; + const aufs_bindex_t br_id; + unsigned char force_lock; +}; + +static aufs_bindex_t si_nfsd_read_lock(struct super_block *sb, + struct au_nfsd_si_lock *nsi_lock) +{ + aufs_bindex_t bindex; + + si_read_lock(sb, AuLock_FLUSH); + + /* branch id may be wrapped around */ + bindex = au_br_index(sb, nsi_lock->br_id); + if (bindex >= 0 && nsi_lock->sigen + AUFS_BRANCH_MAX > au_sigen(sb)) + goto out; /* success */ + + if (!nsi_lock->force_lock) + si_read_unlock(sb); + bindex = -1; + + out: + return bindex; +} + +struct find_name_by_ino { + int called, found; + ino_t ino; + char *name; + int namelen; +}; + +static int +find_name_by_ino(void *arg, const char *name, int namelen, loff_t offset, + u64 ino, unsigned int d_type) +{ + struct find_name_by_ino *a = arg; + + a->called++; + if (a->ino != ino) + return 0; + + memcpy(a->name, name, namelen); + a->namelen = namelen; + a->found = 1; + return 1; +} + +static struct dentry *au_lkup_by_ino(struct path *path, ino_t ino, + struct au_nfsd_si_lock *nsi_lock) +{ + struct dentry *dentry, *parent; + struct file *file; + struct inode *dir; + struct find_name_by_ino arg; + int err; + + parent = path->dentry; + if (nsi_lock) + si_read_unlock(parent->d_sb); + path_get(path); + file = dentry_open(parent, path->mnt, au_dir_roflags, current_cred()); + dentry = (void *)file; + if (IS_ERR(file)) + goto out; + + dentry = ERR_PTR(-ENOMEM); + arg.name = __getname(); + if (unlikely(!arg.name)) + goto out_file; + arg.ino = ino; + arg.found = 0; + do { + arg.called = 0; + /* smp_mb(); */ + err = vfsub_readdir(file, find_name_by_ino, &arg); + } while (!err && !arg.found && arg.called); + dentry = ERR_PTR(err); + if (unlikely(err)) + goto out_name; + dentry = ERR_PTR(-ENOENT); + if (!arg.found) + goto out_name; + + /* do not call au_lkup_one() */ + dir = parent->d_inode; + mutex_lock(&dir->i_mutex); + dentry = vfsub_lookup_one_len(arg.name, parent, arg.namelen); + mutex_unlock(&dir->i_mutex); + AuTraceErrPtr(dentry); + if (IS_ERR(dentry)) + goto out_name; + AuDebugOn(au_test_anon(dentry)); + if (unlikely(!dentry->d_inode)) { + dput(dentry); + dentry = ERR_PTR(-ENOENT); + } + + out_name: + __putname(arg.name); + out_file: + fput(file); + out: + if (unlikely(nsi_lock + && si_nfsd_read_lock(parent->d_sb, nsi_lock) < 0)) + if (!IS_ERR(dentry)) { + dput(dentry); + dentry = ERR_PTR(-ESTALE); + } + AuTraceErrPtr(dentry); + return dentry; +} + +static struct dentry *decode_by_dir_ino(struct super_block *sb, ino_t ino, + ino_t dir_ino, + struct au_nfsd_si_lock *nsi_lock) +{ + struct dentry *dentry; + struct path path; + + if (dir_ino != AUFS_ROOT_INO) { + path.dentry = decode_by_ino(sb, dir_ino, 0); + dentry = path.dentry; + if (!path.dentry || IS_ERR(path.dentry)) + goto out; + AuDebugOn(au_test_anon(path.dentry)); + } else + path.dentry = dget(sb->s_root); + + path.mnt = au_mnt_get(sb); + dentry = au_lkup_by_ino(&path, ino, nsi_lock); + path_put(&path); + + out: + AuTraceErrPtr(dentry); + return dentry; +} + +/* ---------------------------------------------------------------------- */ + +static int h_acceptable(void *expv, struct dentry *dentry) +{ + return 1; +} + +static char *au_build_path(struct dentry *h_parent, struct path *h_rootpath, + char *buf, int len, struct super_block *sb) +{ + char *p; + int n; + struct path path; + + p = d_path(h_rootpath, buf, len); + if (IS_ERR(p)) + goto out; + n = strlen(p); + + path.mnt = h_rootpath->mnt; + path.dentry = h_parent; + p = d_path(&path, buf, len); + if (IS_ERR(p)) + goto out; + if (n != 1) + p += n; + + path.mnt = au_mnt_get(sb); + path.dentry = sb->s_root; + p = d_path(&path, buf, len - strlen(p)); + mntput(path.mnt); + if (IS_ERR(p)) + goto out; + if (n != 1) + p[strlen(p)] = '/'; + + out: + AuTraceErrPtr(p); + return p; +} + +static +struct dentry *decode_by_path(struct super_block *sb, aufs_bindex_t bindex, + ino_t ino, __u32 *fh, int fh_len, + struct au_nfsd_si_lock *nsi_lock) +{ + struct dentry *dentry, *h_parent, *root; + struct super_block *h_sb; + char *pathname, *p; + struct vfsmount *h_mnt; + struct au_branch *br; + int err; + struct path path; + + br = au_sbr(sb, bindex); + /* au_br_get(br); */ + h_mnt = br->br_mnt; + h_sb = h_mnt->mnt_sb; + /* todo: call lower fh_to_dentry()? fh_to_parent()? */ + h_parent = exportfs_decode_fh(h_mnt, (void *)(fh + Fh_tail), + fh_len - Fh_tail, fh[Fh_h_type], + h_acceptable, /*context*/NULL); + dentry = h_parent; + if (unlikely(!h_parent || IS_ERR(h_parent))) { + AuWarn1("%s decode_fh failed, %ld\n", + au_sbtype(h_sb), PTR_ERR(h_parent)); + goto out; + } + dentry = NULL; + if (unlikely(au_test_anon(h_parent))) { + AuWarn1("%s decode_fh returned a disconnected dentry\n", + au_sbtype(h_sb)); + goto out_h_parent; + } + + dentry = ERR_PTR(-ENOMEM); + pathname = (void *)__get_free_page(GFP_NOFS); + if (unlikely(!pathname)) + goto out_h_parent; + + root = sb->s_root; + path.mnt = h_mnt; + di_read_lock_parent(root, !AuLock_IR); + path.dentry = au_h_dptr(root, bindex); + di_read_unlock(root, !AuLock_IR); + p = au_build_path(h_parent, &path, pathname, PAGE_SIZE, sb); + dentry = (void *)p; + if (IS_ERR(p)) + goto out_pathname; + + si_read_unlock(sb); + err = vfsub_kern_path(p, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); + dentry = ERR_PTR(err); + if (unlikely(err)) + goto out_relock; + + dentry = ERR_PTR(-ENOENT); + AuDebugOn(au_test_anon(path.dentry)); + if (unlikely(!path.dentry->d_inode)) + goto out_path; + + if (ino != path.dentry->d_inode->i_ino) + dentry = au_lkup_by_ino(&path, ino, /*nsi_lock*/NULL); + else + dentry = dget(path.dentry); + + out_path: + path_put(&path); + out_relock: + if (unlikely(si_nfsd_read_lock(sb, nsi_lock) < 0)) + if (!IS_ERR(dentry)) { + dput(dentry); + dentry = ERR_PTR(-ESTALE); + } + out_pathname: + free_page((unsigned long)pathname); + out_h_parent: + dput(h_parent); + out: + /* au_br_put(br); */ + AuTraceErrPtr(dentry); + return dentry; +} + +/* ---------------------------------------------------------------------- */ + +static struct dentry * +aufs_fh_to_dentry(struct super_block *sb, struct fid *fid, int fh_len, + int fh_type) +{ + struct dentry *dentry; + __u32 *fh = fid->raw; + ino_t ino, dir_ino; + aufs_bindex_t bindex; + struct au_nfsd_si_lock nsi_lock = { + .sigen = fh[Fh_sigen], + .br_id = fh[Fh_br_id], + .force_lock = 0 + }; + + AuDebugOn(fh_len < Fh_tail); + + dentry = ERR_PTR(-ESTALE); + /* branch id may be wrapped around */ + bindex = si_nfsd_read_lock(sb, &nsi_lock); + if (unlikely(bindex < 0)) + goto out; + nsi_lock.force_lock = 1; + + /* is this inode still cached? */ + ino = decode_ino(fh + Fh_ino); + AuDebugOn(ino == AUFS_ROOT_INO); + dir_ino = decode_ino(fh + Fh_dir_ino); + dentry = decode_by_ino(sb, ino, dir_ino); + if (IS_ERR(dentry)) + goto out_unlock; + if (dentry) + goto accept; + + /* is the parent dir cached? */ + dentry = decode_by_dir_ino(sb, ino, dir_ino, &nsi_lock); + if (IS_ERR(dentry)) + goto out_unlock; + if (dentry) + goto accept; + + /* lookup path */ + dentry = decode_by_path(sb, bindex, ino, fh, fh_len, &nsi_lock); + if (IS_ERR(dentry)) + goto out_unlock; + if (unlikely(!dentry)) + /* todo?: make it ESTALE */ + goto out_unlock; + + accept: + if (dentry->d_inode->i_generation == fh[Fh_igen]) + goto out_unlock; /* success */ + + dput(dentry); + dentry = ERR_PTR(-ESTALE); + out_unlock: + si_read_unlock(sb); + out: + AuTraceErrPtr(dentry); + return dentry; +} + +#if 0 /* reserved for future use */ +/* support subtreecheck option */ +static struct dentry *aufs_fh_to_parent(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + struct dentry *parent; + __u32 *fh = fid->raw; + ino_t dir_ino; + + dir_ino = decode_ino(fh + Fh_dir_ino); + parent = decode_by_ino(sb, dir_ino, 0); + if (IS_ERR(parent)) + goto out; + if (!parent) + parent = decode_by_path(sb, au_br_index(sb, fh[Fh_br_id]), + dir_ino, fh, fh_len); + + out: + AuTraceErrPtr(parent); + return parent; +} +#endif + +/* ---------------------------------------------------------------------- */ + +static int aufs_encode_fh(struct dentry *dentry, __u32 *fh, int *max_len, + int connectable) +{ + int err; + aufs_bindex_t bindex, bend; + struct super_block *sb, *h_sb; + struct inode *inode; + struct dentry *parent, *h_parent; + struct au_branch *br; + + AuDebugOn(au_test_anon(dentry)); + + parent = NULL; + err = -ENOSPC; + if (unlikely(*max_len <= Fh_tail)) { + AuWarn1("NFSv2 client (max_len %d)?\n", *max_len); + goto out; + } + + err = FILEID_ROOT; + if (IS_ROOT(dentry)) { + AuDebugOn(dentry->d_inode->i_ino != AUFS_ROOT_INO); + goto out; + } + + err = -EIO; + h_parent = NULL; + sb = dentry->d_sb; + aufs_read_lock(dentry, AuLock_FLUSH | AuLock_IR); + parent = dget_parent(dentry); + di_read_lock_parent(parent, !AuLock_IR); + inode = dentry->d_inode; + AuDebugOn(!inode); +#ifdef CONFIG_AUFS_DEBUG + if (unlikely(!au_opt_test(au_mntflags(sb), XINO))) + AuWarn1("NFS-exporting requires xino\n"); +#endif + + bend = au_dbtaildir(parent); + for (bindex = au_dbstart(parent); bindex <= bend; bindex++) { + h_parent = au_h_dptr(parent, bindex); + if (h_parent) { + dget(h_parent); + break; + } + } + if (unlikely(!h_parent)) + goto out_unlock; + + err = -EPERM; + br = au_sbr(sb, bindex); + h_sb = br->br_mnt->mnt_sb; + if (unlikely(!h_sb->s_export_op)) { + AuErr1("%s branch is not exportable\n", au_sbtype(h_sb)); + goto out_dput; + } + + fh[Fh_br_id] = br->br_id; + fh[Fh_sigen] = au_sigen(sb); + encode_ino(fh + Fh_ino, inode->i_ino); + encode_ino(fh + Fh_dir_ino, parent->d_inode->i_ino); + fh[Fh_igen] = inode->i_generation; + + *max_len -= Fh_tail; + fh[Fh_h_type] = exportfs_encode_fh(h_parent, (void *)(fh + Fh_tail), + max_len, + /*connectable or subtreecheck*/0); + err = fh[Fh_h_type]; + *max_len += Fh_tail; + /* todo: macros? */ + if (err != 255) + err = 99; + else + AuWarn1("%s encode_fh failed\n", au_sbtype(h_sb)); + + out_dput: + dput(h_parent); + out_unlock: + di_read_unlock(parent, !AuLock_IR); + dput(parent); + aufs_read_unlock(dentry, AuLock_IR); + out: + if (unlikely(err < 0)) + err = 255; + return err; +} + +/* ---------------------------------------------------------------------- */ + +static struct export_operations aufs_export_op = { + .fh_to_dentry = aufs_fh_to_dentry, + /* .fh_to_parent = aufs_fh_to_parent, */ + .encode_fh = aufs_encode_fh +}; + +void au_export_init(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + __u32 u; + + sb->s_export_op = &aufs_export_op; + sbinfo = au_sbi(sb); + sbinfo->si_xigen = NULL; + get_random_bytes(&u, sizeof(u)); + BUILD_BUG_ON(sizeof(u) != sizeof(int)); + atomic_set(&sbinfo->si_xigen_next, u); +} diff -urN ./fs/aufs/f_op.c ./fs/aufs/f_op.c --- ./fs/aufs/f_op.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/f_op.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,551 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * file and vm operations + */ + +#include +#include +#include "aufs.h" + +/* common function to regular file and dir */ +int aufs_flush(struct file *file, fl_owner_t id) +{ + int err; + aufs_bindex_t bindex, bend; + struct dentry *dentry; + struct file *h_file; + + dentry = file->f_dentry; + si_noflush_read_lock(dentry->d_sb); + fi_read_lock(file); + di_read_lock_child(dentry, AuLock_IW); + + err = 0; + bend = au_fbend(file); + for (bindex = au_fbstart(file); !err && bindex <= bend; bindex++) { + h_file = au_h_fptr(file, bindex); + if (!h_file || !h_file->f_op->flush) + continue; + + err = h_file->f_op->flush(h_file, id); + if (!err) + vfsub_update_h_iattr(&h_file->f_path, /*did*/NULL); + /*ignore*/ + } + au_cpup_attr_timesizes(dentry->d_inode); + + di_read_unlock(dentry, AuLock_IW); + fi_read_unlock(file); + si_read_unlock(dentry->d_sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int do_open_nondir(struct file *file, int flags) +{ + int err; + aufs_bindex_t bindex; + struct file *h_file; + struct dentry *dentry; + + err = 0; + dentry = file->f_dentry; + au_fi(file)->fi_h_vm_ops = NULL; + bindex = au_dbstart(dentry); + /* O_TRUNC is processed already */ + BUG_ON(au_test_ro(dentry->d_sb, bindex, dentry->d_inode) + && (flags & O_TRUNC)); + + h_file = au_h_open(dentry, bindex, flags, file); + if (IS_ERR(h_file)) + err = PTR_ERR(h_file); + else { + au_set_fbstart(file, bindex); + au_set_fbend(file, bindex); + au_set_h_fptr(file, bindex, h_file); + au_update_figen(file); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + } + return err; +} + +static int aufs_open_nondir(struct inode *inode __maybe_unused, + struct file *file) +{ + return au_do_open(file, do_open_nondir); +} + +static int aufs_release_nondir(struct inode *inode __maybe_unused, + struct file *file) +{ + struct super_block *sb = file->f_dentry->d_sb; + + si_noflush_read_lock(sb); + au_finfo_fin(file); + si_read_unlock(sb); + return 0; +} + +/* ---------------------------------------------------------------------- */ + +static ssize_t aufs_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) +{ + ssize_t err; + struct dentry *dentry; + struct file *h_file; + struct super_block *sb; + + dentry = file->f_dentry; + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/0); + if (unlikely(err)) + goto out; + + h_file = au_h_fptr(file, au_fbstart(file)); + err = vfsub_read_u(h_file, buf, count, ppos); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + fsstack_copy_attr_atime(dentry->d_inode, h_file->f_dentry->d_inode); + + di_read_unlock(dentry, AuLock_IR); + fi_read_unlock(file); + out: + si_read_unlock(sb); + return err; +} + +static ssize_t aufs_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + ssize_t err; + aufs_bindex_t bstart; + struct au_pin pin; + struct dentry *dentry; + struct inode *inode; + struct super_block *sb; + struct file *h_file; + char __user *buf = (char __user *)ubuf; + + dentry = file->f_dentry; + sb = dentry->d_sb; + inode = dentry->d_inode; + mutex_lock(&inode->i_mutex); + si_read_lock(sb, AuLock_FLUSH); + + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/1); + if (unlikely(err)) + goto out; + + err = au_ready_to_write(file, -1, &pin); + di_downgrade_lock(dentry, AuLock_IR); + if (unlikely(err)) + goto out_unlock; + + bstart = au_fbstart(file); + h_file = au_h_fptr(file, bstart); + au_unpin(&pin); + err = vfsub_write_u(h_file, buf, count, ppos); + au_cpup_attr_timesizes(inode); + inode->i_mode = h_file->f_dentry->d_inode->i_mode; + + out_unlock: + di_read_unlock(dentry, AuLock_IR); + fi_write_unlock(file); + out: + si_read_unlock(sb); + mutex_unlock(&inode->i_mutex); + return err; +} + +static ssize_t aufs_splice_read(struct file *file, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + ssize_t err; + struct file *h_file; + struct dentry *dentry; + struct super_block *sb; + + dentry = file->f_dentry; + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/0); + if (unlikely(err)) + goto out; + + err = -EINVAL; + h_file = au_h_fptr(file, au_fbstart(file)); + if (au_test_loopback_kthread()) { + file->f_mapping = h_file->f_mapping; + smp_mb(); /* unnecessary? */ + } + err = vfsub_splice_to(h_file, ppos, pipe, len, flags); + /* todo: necessasry? */ + /* file->f_ra = h_file->f_ra; */ + fsstack_copy_attr_atime(dentry->d_inode, h_file->f_dentry->d_inode); + + di_read_unlock(dentry, AuLock_IR); + fi_read_unlock(file); + + out: + si_read_unlock(sb); + return err; +} + +static ssize_t +aufs_splice_write(struct pipe_inode_info *pipe, struct file *file, loff_t *ppos, + size_t len, unsigned int flags) +{ + ssize_t err; + struct au_pin pin; + struct dentry *dentry; + struct inode *inode; + struct super_block *sb; + struct file *h_file; + + dentry = file->f_dentry; + inode = dentry->d_inode; + mutex_lock(&inode->i_mutex); + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/1); + if (unlikely(err)) + goto out; + + err = au_ready_to_write(file, -1, &pin); + di_downgrade_lock(dentry, AuLock_IR); + if (unlikely(err)) + goto out_unlock; + + h_file = au_h_fptr(file, au_fbstart(file)); + au_unpin(&pin); + err = vfsub_splice_from(pipe, h_file, ppos, len, flags); + au_cpup_attr_timesizes(inode); + inode->i_mode = h_file->f_dentry->d_inode->i_mode; + + out_unlock: + di_read_unlock(dentry, AuLock_IR); + fi_write_unlock(file); + out: + si_read_unlock(sb); + mutex_unlock(&inode->i_mutex); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static struct file *au_safe_file(struct vm_area_struct *vma) +{ + struct file *file; + + file = vma->vm_file; + if (file->private_data && au_test_aufs(file->f_dentry->d_sb)) + return file; + return NULL; +} + +static void au_reset_file(struct vm_area_struct *vma, struct file *file) +{ + vma->vm_file = file; + /* smp_mb(); */ /* flush vm_file */ +} + +static int aufs_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + int err; + static DECLARE_WAIT_QUEUE_HEAD(wq); + struct file *file, *h_file; + struct au_finfo *finfo; + + /* todo: non-robr mode, user vm_file as it is? */ + wait_event(wq, (file = au_safe_file(vma))); + + /* do not revalidate, no si lock */ + finfo = au_fi(file); + h_file = finfo->fi_hfile[0 + finfo->fi_bstart].hf_file; + AuDebugOn(!h_file || !au_test_mmapped(file)); + + fi_write_lock(file); + vma->vm_file = h_file; + err = finfo->fi_h_vm_ops->fault(vma, vmf); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + au_reset_file(vma, file); + fi_write_unlock(file); +#if 0 /* def CONFIG_SMP */ + /* wake_up_nr(&wq, online_cpu - 1); */ + wake_up_all(&wq); +#else + wake_up(&wq); +#endif + + return err; +} + +static struct vm_operations_struct aufs_vm_ops = { + .fault = aufs_fault +}; + +/* ---------------------------------------------------------------------- */ + +static struct vm_operations_struct *au_vm_ops(struct file *h_file, + struct vm_area_struct *vma) +{ + struct vm_operations_struct *vm_ops; + int err; + + err = h_file->f_op->mmap(h_file, vma); + vm_ops = ERR_PTR(err); + if (unlikely(err)) + goto out; + vm_ops = vma->vm_ops; + err = do_munmap(current->mm, vma->vm_start, + vma->vm_end - vma->vm_start); + if (unlikely(err)) { + AuIOErr("failed internal unmapping %.*s, %d\n", + AuDLNPair(h_file->f_dentry), err); + vm_ops = ERR_PTR(-EIO); + } + + out: + return vm_ops; +} + +static int aufs_mmap(struct file *file, struct vm_area_struct *vma) +{ + int err; + unsigned char wlock, mmapped; + struct dentry *dentry; + struct super_block *sb; + struct file *h_file; + struct vm_operations_struct *vm_ops; + + dentry = file->f_dentry; + mmapped = !!au_test_mmapped(file); /* can be harmless race condition */ + wlock = !!(file->f_mode & FMODE_WRITE); + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, au_reopen_nondir, wlock | !mmapped); + if (unlikely(err)) + goto out; + + if (wlock) { + struct au_pin pin; + + err = au_ready_to_write(file, -1, &pin); + di_downgrade_lock(dentry, AuLock_IR); + if (unlikely(err)) + goto out_unlock; + au_unpin(&pin); + } else if (!mmapped) + di_downgrade_lock(dentry, AuLock_IR); + + h_file = au_h_fptr(file, au_fbstart(file)); + if (au_test_fs_bad_mapping(h_file->f_dentry->d_sb)) { + /* + * by this assignment, f_mapping will differs from aufs inode + * i_mapping. + * if someone else mixes the use of f_dentry->d_inode and + * f_mapping->host, then a problem may arise. + */ + file->f_mapping = h_file->f_mapping; + } + + vm_ops = NULL; + if (!mmapped) { + vm_ops = au_vm_ops(h_file, vma); + err = PTR_ERR(vm_ops); + if (IS_ERR(vm_ops)) + goto out_unlock; + } + + /* + * unnecessary to handle MAP_DENYWRITE and deny_write_access()? + * currently MAP_DENYWRITE from userspace is ignored, but elf loader + * sets it. when FMODE_EXEC is set (by open_exec() or sys_uselib()), + * both of the aufs file and the lower file is deny_write_access()-ed. + * finally I hope we can skip handlling MAP_DENYWRITE here. + */ + err = generic_file_mmap(file, vma); + if (unlikely(err)) + goto out_unlock; + vma->vm_ops = &aufs_vm_ops; + /* test again */ + if (!au_test_mmapped(file)) + au_fi(file)->fi_h_vm_ops = vm_ops; + + vfsub_file_accessed(h_file); + fsstack_copy_attr_atime(dentry->d_inode, h_file->f_dentry->d_inode); + + out_unlock: + di_read_unlock(dentry, AuLock_IR); + if (!wlock && mmapped) + fi_read_unlock(file); + else + fi_write_unlock(file); + out: + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static unsigned int aufs_poll(struct file *file, poll_table *wait) +{ + unsigned int mask; + int err; + struct file *h_file; + struct dentry *dentry; + struct super_block *sb; + + /* We should pretend an error happened. */ + mask = POLLERR /* | POLLIN | POLLOUT */; + dentry = file->f_dentry; + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/0); + if (unlikely(err)) + goto out; + + /* it is not an error if h_file has no operation */ + mask = DEFAULT_POLLMASK; + h_file = au_h_fptr(file, au_fbstart(file)); + if (h_file->f_op->poll) + mask = h_file->f_op->poll(h_file, wait); + + di_read_unlock(dentry, AuLock_IR); + fi_read_unlock(file); + + out: + si_read_unlock(sb); + AuTraceErr((int)mask); + return mask; +} + +static int aufs_fsync_nondir(struct file *file, struct dentry *dentry, + int datasync) +{ + int err; + struct au_pin pin; + struct inode *inode; + struct file *h_file; + struct super_block *sb; + + inode = dentry->d_inode; + IMustLock(file->f_mapping->host); + if (inode != file->f_mapping->host) { + mutex_unlock(&file->f_mapping->host->i_mutex); + mutex_lock(&inode->i_mutex); + } + IMustLock(inode); + + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + + err = 0; /* -EBADF; */ /* posix? */ + if (unlikely(!(file->f_mode & FMODE_WRITE))) + goto out; + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/1); + if (unlikely(err)) + goto out; + + err = au_ready_to_write(file, -1, &pin); + di_downgrade_lock(dentry, AuLock_IR); + if (unlikely(err)) + goto out_unlock; + au_unpin(&pin); + + err = -EINVAL; + h_file = au_h_fptr(file, au_fbstart(file)); + if (h_file->f_op->fsync) { + struct dentry *h_d; + struct mutex *h_mtx; + + /* + * no filemap_fdatawrite() since aufs file has no its own + * mapping, but dir. + */ + h_d = h_file->f_dentry; + h_mtx = &h_d->d_inode->i_mutex; + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + err = h_file->f_op->fsync(h_file, h_d, datasync); + if (!err) + vfsub_update_h_iattr(&h_file->f_path, /*did*/NULL); + /*ignore*/ + au_cpup_attr_timesizes(inode); + mutex_unlock(h_mtx); + } + + out_unlock: + di_read_unlock(dentry, AuLock_IR); + fi_write_unlock(file); + out: + si_read_unlock(sb); + if (inode != file->f_mapping->host) { + mutex_unlock(&inode->i_mutex); + mutex_lock(&file->f_mapping->host->i_mutex); + } + return err; +} + +static int aufs_fasync(int fd, struct file *file, int flag) +{ + int err; + struct file *h_file; + struct dentry *dentry; + struct super_block *sb; + + dentry = file->f_dentry; + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_reval_and_lock_fdi(file, au_reopen_nondir, /*wlock*/0); + if (unlikely(err)) + goto out; + + h_file = au_h_fptr(file, au_fbstart(file)); + if (h_file->f_op->fasync) + err = h_file->f_op->fasync(fd, h_file, flag); + + di_read_unlock(dentry, AuLock_IR); + fi_read_unlock(file); + + out: + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +const struct file_operations aufs_file_fop = { + /* + * while generic_file_llseek/_unlocked() don't use BKL, + * don't use it since it operates file->f_mapping->host. + * in aufs, it may be a real file and may confuse users by UDBA. + */ + /* .llseek = generic_file_llseek, */ + + .read = aufs_read, + .write = aufs_write, + .poll = aufs_poll, + .mmap = aufs_mmap, + .open = aufs_open_nondir, + .flush = aufs_flush, + .release = aufs_release_nondir, + .fsync = aufs_fsync_nondir, + .fasync = aufs_fasync, + .splice_write = aufs_splice_write, + .splice_read = aufs_splice_read +}; diff -urN ./fs/aufs/file.c ./fs/aufs/file.c --- ./fs/aufs/file.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/file.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,558 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * handling file/dir, and address_space operation + */ + +#include +#include +#include "aufs.h" + +/* + * a dirty trick for handling deny_write_access(). + * because FMODE_EXEC flag is not passed to f_op->open(), + * set it to file->private_data temporary. + */ +void au_store_oflag(struct nameidata *nd, struct inode *inode) +{ + if (nd + /* && !(nd->flags & LOOKUP_CONTINUE) */ + && (nd->flags & LOOKUP_OPEN) + && (nd->intent.open.flags & vfsub_fmode_to_uint(FMODE_EXEC)) + && inode + && S_ISREG(inode->i_mode)) { + /* suppress a warning in lp64 */ + unsigned long flags = nd->intent.open.flags; + nd->intent.open.file->private_data = (void *)flags; + /* smp_mb(); */ + } +} + +/* drop flags for writing */ +unsigned int au_file_roflags(unsigned int flags) +{ + flags &= ~(O_WRONLY | O_RDWR | O_APPEND | O_CREAT | O_TRUNC); + flags |= O_RDONLY | O_NOATIME; + return flags; +} + +/* common functions to regular file and dir */ +struct file *au_h_open(struct dentry *dentry, aufs_bindex_t bindex, int flags, + struct file *file) +{ + struct file *h_file; + struct dentry *h_dentry; + struct inode *h_inode; + struct super_block *sb; + struct au_branch *br; + int err; + + h_dentry = au_h_dptr(dentry, bindex); + h_inode = h_dentry->d_inode; + /* a race condition can happen between open and unlink/rmdir */ + h_file = ERR_PTR(-ENOENT); + if (unlikely((!d_unhashed(dentry) && d_unhashed(h_dentry)) + || !h_inode)) + goto out; + + sb = dentry->d_sb; + br = au_sbr(sb, bindex); + h_file = ERR_PTR(-EACCES); + if (file && (file->f_mode & FMODE_EXEC) + && (br->br_mnt->mnt_flags & MNT_NOEXEC)) + goto out; + + /* drop flags for writing */ + if (au_test_ro(sb, bindex, dentry->d_inode)) + flags = au_file_roflags(flags); + flags &= ~O_CREAT; + atomic_inc(&br->br_count); + h_file = dentry_open(dget(h_dentry), mntget(br->br_mnt), flags, + current_cred()); + if (IS_ERR(h_file)) + goto out_br; + AuDebugOn(!h_file->f_op); + + if (file && (file->f_mode & FMODE_EXEC)) { + h_file->f_mode |= FMODE_EXEC; + err = deny_write_access(h_file); + if (unlikely(err)) { + fput(h_file); + h_file = ERR_PTR(err); + goto out_br; + } + } + fsnotify_open(h_dentry); + goto out; /* success */ + + out_br: + atomic_dec(&br->br_count); + out: + return h_file; +} + +int au_do_open(struct file *file, int (*open)(struct file *file, int flags)) +{ + int err; + struct dentry *dentry; + struct super_block *sb; + + dentry = file->f_dentry; + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_finfo_init(file); + if (unlikely(err)) + goto out; + + di_read_lock_child(dentry, AuLock_IR); + err = open(file, file->f_flags); + di_read_unlock(dentry, AuLock_IR); + + fi_write_unlock(file); + if (unlikely(err)) + au_finfo_fin(file); + out: + si_read_unlock(sb); + return err; +} + +int au_reopen_nondir(struct file *file) +{ + int err; + aufs_bindex_t bstart, bindex, bend; + struct dentry *dentry; + struct file *h_file, *h_file_tmp; + + dentry = file->f_dentry; + bstart = au_dbstart(dentry); + h_file_tmp = NULL; + if (au_fbstart(file) == bstart) { + h_file = au_h_fptr(file, bstart); + if (file->f_mode == h_file->f_mode) + return 0; /* success */ + h_file_tmp = h_file; + get_file(h_file_tmp); + au_set_h_fptr(file, bstart, NULL); + } + AuDebugOn(au_fbstart(file) < bstart + || au_fi(file)->fi_hfile[0 + bstart].hf_file); + + h_file = au_h_open(dentry, bstart, file->f_flags & ~O_TRUNC, file); + err = PTR_ERR(h_file); + if (IS_ERR(h_file)) + goto out; /* todo: close all? */ + + err = 0; + au_set_fbstart(file, bstart); + au_set_h_fptr(file, bstart, h_file); + au_update_figen(file); + /* todo: necessary? */ + /* file->f_ra = h_file->f_ra; */ + + /* close lower files */ + bend = au_fbend(file); + for (bindex = bstart + 1; bindex <= bend; bindex++) + au_set_h_fptr(file, bindex, NULL); + au_set_fbend(file, bstart); + + out: + if (h_file_tmp) + fput(h_file_tmp); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int au_reopen_wh(struct file *file, aufs_bindex_t btgt, + struct dentry *hi_wh) +{ + int err; + aufs_bindex_t bstart; + struct au_dinfo *dinfo; + struct dentry *h_dentry; + + dinfo = au_di(file->f_dentry); + bstart = dinfo->di_bstart; + dinfo->di_bstart = btgt; + h_dentry = dinfo->di_hdentry[0 + btgt].hd_dentry; + dinfo->di_hdentry[0 + btgt].hd_dentry = hi_wh; + err = au_reopen_nondir(file); + dinfo->di_hdentry[0 + btgt].hd_dentry = h_dentry; + dinfo->di_bstart = bstart; + + return err; +} + +static int au_ready_to_write_wh(struct file *file, loff_t len, + aufs_bindex_t bcpup) +{ + int err; + struct inode *inode; + struct dentry *dentry, *hi_wh; + struct super_block *sb; + + dentry = file->f_dentry; + inode = dentry->d_inode; + hi_wh = au_hi_wh(inode, bcpup); + if (!hi_wh) + err = au_sio_cpup_wh(dentry, bcpup, len, file); + else + /* already copied-up after unlink */ + err = au_reopen_wh(file, bcpup, hi_wh); + + sb = dentry->d_sb; + if (!err && inode->i_nlink > 1 && au_opt_test(au_mntflags(sb), PLINK)) + au_plink_append(inode, bcpup, au_h_dptr(dentry, bcpup)); + + return err; +} + +/* + * prepare the @file for writing. + */ +int au_ready_to_write(struct file *file, loff_t len, struct au_pin *pin) +{ + int err; + aufs_bindex_t bstart, bcpup; + struct dentry *dentry, *parent, *h_dentry; + struct inode *h_inode, *inode; + struct super_block *sb; + + dentry = file->f_dentry; + sb = dentry->d_sb; + bstart = au_fbstart(file); + inode = dentry->d_inode; + err = au_test_ro(sb, bstart, inode); + if (!err && (au_h_fptr(file, bstart)->f_mode & FMODE_WRITE)) { + err = au_pin(pin, dentry, bstart, AuOpt_UDBA_NONE, /*flags*/0); + goto out; + } + + /* need to cpup */ + parent = dget_parent(dentry); + di_write_lock_parent(parent); + err = AuWbrCopyup(au_sbi(sb), dentry); + bcpup = err; + if (unlikely(err < 0)) + goto out_dgrade; + err = 0; + + if (!au_h_dptr(parent, bcpup)) { + err = au_cpup_dirs(dentry, bcpup); + if (unlikely(err)) + goto out_dgrade; + } + + err = au_pin(pin, dentry, bcpup, AuOpt_UDBA_NONE, + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + if (unlikely(err)) + goto out_dgrade; + + h_dentry = au_h_fptr(file, bstart)->f_dentry; + h_inode = h_dentry->d_inode; + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + if (d_unhashed(dentry) /* || d_unhashed(h_dentry) */ + /* || !h_inode->i_nlink */) { + err = au_ready_to_write_wh(file, len, bcpup); + di_downgrade_lock(parent, AuLock_IR); + } else { + di_downgrade_lock(parent, AuLock_IR); + if (!au_h_dptr(dentry, bcpup)) + err = au_sio_cpup_simple(dentry, bcpup, len, + AuCpup_DTIME); + if (!err) + err = au_reopen_nondir(file); + } + mutex_unlock(&h_inode->i_mutex); + + if (!err) { + au_pin_set_parent_lflag(pin, /*lflag*/0); + goto out_dput; /* success */ + } + au_unpin(pin); + goto out_unlock; + + out_dgrade: + di_downgrade_lock(parent, AuLock_IR); + out_unlock: + di_read_unlock(parent, AuLock_IR); + out_dput: + dput(parent); + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int au_file_refresh_by_inode(struct file *file, int *need_reopen) +{ + int err; + aufs_bindex_t bstart; + struct au_pin pin; + struct au_finfo *finfo; + struct dentry *dentry, *parent, *hi_wh; + struct inode *inode; + struct super_block *sb; + + err = 0; + finfo = au_fi(file); + dentry = file->f_dentry; + sb = dentry->d_sb; + inode = dentry->d_inode; + bstart = au_ibstart(inode); + if (bstart == finfo->fi_bstart) + goto out; + + parent = dget_parent(dentry); + if (au_test_ro(sb, bstart, inode)) { + di_read_lock_parent(parent, !AuLock_IR); + err = AuWbrCopyup(au_sbi(sb), dentry); + bstart = err; + di_read_unlock(parent, !AuLock_IR); + if (unlikely(err < 0)) + goto out_parent; + err = 0; + } + + di_read_lock_parent(parent, AuLock_IR); + hi_wh = au_hi_wh(inode, bstart); + if (au_opt_test(au_mntflags(sb), PLINK) + && au_plink_test(inode) + && !d_unhashed(dentry)) { + err = au_test_and_cpup_dirs(dentry, bstart); + if (unlikely(err)) + goto out_unlock; + + /* always superio. */ + err = au_pin(&pin, dentry, bstart, AuOpt_UDBA_NONE, + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + if (!err) + err = au_sio_cpup_simple(dentry, bstart, -1, + AuCpup_DTIME); + au_unpin(&pin); + } else if (hi_wh) { + /* already copied-up after unlink */ + err = au_reopen_wh(file, bstart, hi_wh); + *need_reopen = 0; + } + + out_unlock: + di_read_unlock(parent, AuLock_IR); + out_parent: + dput(parent); + out: + return err; +} + +static void au_do_refresh_file(struct file *file) +{ + aufs_bindex_t bindex, bend, new_bindex, brid; + struct au_hfile *p, tmp, *q; + struct au_finfo *finfo; + struct super_block *sb; + + sb = file->f_dentry->d_sb; + finfo = au_fi(file); + p = finfo->fi_hfile + finfo->fi_bstart; + brid = p->hf_br->br_id; + bend = finfo->fi_bend; + for (bindex = finfo->fi_bstart; bindex <= bend; bindex++, p++) { + if (!p->hf_file) + continue; + + new_bindex = au_br_index(sb, p->hf_br->br_id); + if (new_bindex == bindex) + continue; + if (new_bindex < 0) { + au_set_h_fptr(file, bindex, NULL); + continue; + } + + /* swap two lower inode, and loop again */ + q = finfo->fi_hfile + new_bindex; + tmp = *q; + *q = *p; + *p = tmp; + if (tmp.hf_file) { + bindex--; + p--; + } + } + + p = finfo->fi_hfile; + if (!au_test_mmapped(file) && !d_unhashed(file->f_dentry)) { + bend = au_sbend(sb); + for (finfo->fi_bstart = 0; finfo->fi_bstart <= bend; + finfo->fi_bstart++, p++) + if (p->hf_file) { + if (p->hf_file->f_dentry + && p->hf_file->f_dentry->d_inode) + break; + else + au_hfput(p, file); + } + } else { + bend = au_br_index(sb, brid); + for (finfo->fi_bstart = 0; finfo->fi_bstart < bend; + finfo->fi_bstart++, p++) + if (p->hf_file) + au_hfput(p, file); + bend = au_sbend(sb); + } + + p = finfo->fi_hfile + bend; + for (finfo->fi_bend = bend; finfo->fi_bend >= finfo->fi_bstart; + finfo->fi_bend--, p--) + if (p->hf_file) { + if (p->hf_file->f_dentry + && p->hf_file->f_dentry->d_inode) + break; + else + au_hfput(p, file); + } + AuDebugOn(finfo->fi_bend < finfo->fi_bstart); +} + +/* + * after branch manipulating, refresh the file. + */ +static int refresh_file(struct file *file, int (*reopen)(struct file *file)) +{ + int err, need_reopen; + struct dentry *dentry; + aufs_bindex_t bend, bindex; + + dentry = file->f_dentry; + err = au_fi_realloc(au_fi(file), au_sbend(dentry->d_sb) + 1); + if (unlikely(err)) + goto out; + au_do_refresh_file(file); + + err = 0; + need_reopen = 1; + if (!au_test_mmapped(file)) + err = au_file_refresh_by_inode(file, &need_reopen); + if (!err && need_reopen && !d_unhashed(dentry)) + err = reopen(file); + if (!err) { + au_update_figen(file); + return 0; /* success */ + } + + /* error, close all lower files */ + bend = au_fbend(file); + for (bindex = au_fbstart(file); bindex <= bend; bindex++) + au_set_h_fptr(file, bindex, NULL); + + out: + return err; +} + +/* common function to regular file and dir */ +int au_reval_and_lock_fdi(struct file *file, int (*reopen)(struct file *file), + int wlock) +{ + int err; + unsigned int sigen, figen; + aufs_bindex_t bstart; + unsigned char pseudo_link; + struct dentry *dentry; + + err = 0; + dentry = file->f_dentry; + sigen = au_sigen(dentry->d_sb); + fi_write_lock(file); + figen = au_figen(file); + di_write_lock_child(dentry); + bstart = au_dbstart(dentry); + pseudo_link = (bstart != au_ibstart(dentry->d_inode)); + if (sigen == figen && !pseudo_link && au_fbstart(file) == bstart) { + if (!wlock) { + di_downgrade_lock(dentry, AuLock_IR); + fi_downgrade_lock(file); + } + goto out; /* success */ + } + + AuDbg("sigen %d, figen %d\n", sigen, figen); + if (sigen != au_digen(dentry) + || sigen != au_iigen(dentry->d_inode)) { + err = au_reval_dpath(dentry, sigen); + if (unlikely(err < 0)) + goto out; + AuDebugOn(au_digen(dentry) != sigen + || au_iigen(dentry->d_inode) != sigen); + } + + err = refresh_file(file, reopen); + if (!err) { + if (!wlock) { + di_downgrade_lock(dentry, AuLock_IR); + fi_downgrade_lock(file); + } + } else { + di_write_unlock(dentry); + fi_write_unlock(file); + } + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* cf. aufs_nopage() */ +/* for madvise(2) */ +static int aufs_readpage(struct file *file __maybe_unused, struct page *page) +{ + unlock_page(page); + return 0; +} + +/* they will never be called. */ +#ifdef CONFIG_AUFS_DEBUG +static int aufs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) +{ AuUnsupport(); return 0; } +static int aufs_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) +{ AuUnsupport(); return 0; } +static int aufs_writepage(struct page *page, struct writeback_control *wbc) +{ AuUnsupport(); return 0; } +static void aufs_sync_page(struct page *page) +{ AuUnsupport(); } + +static int aufs_set_page_dirty(struct page *page) +{ AuUnsupport(); return 0; } +static void aufs_invalidatepage(struct page *page, unsigned long offset) +{ AuUnsupport(); } +static int aufs_releasepage(struct page *page, gfp_t gfp) +{ AuUnsupport(); return 0; } +static ssize_t aufs_direct_IO(int rw, struct kiocb *iocb, + const struct iovec *iov, loff_t offset, + unsigned long nr_segs) +{ AuUnsupport(); return 0; } +#endif /* CONFIG_AUFS_DEBUG */ + +struct address_space_operations aufs_aop = { + .readpage = aufs_readpage, +#ifdef CONFIG_AUFS_DEBUG + .writepage = aufs_writepage, + .sync_page = aufs_sync_page, + .set_page_dirty = aufs_set_page_dirty, + .write_begin = aufs_write_begin, + .write_end = aufs_write_end, + .invalidatepage = aufs_invalidatepage, + .releasepage = aufs_releasepage, + .direct_IO = aufs_direct_IO, +#endif /* CONFIG_AUFS_DEBUG */ +}; diff -urN ./fs/aufs/file.h ./fs/aufs/file.h --- ./fs/aufs/file.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/file.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,148 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * file operations + */ + +#ifndef __AUFS_FILE_H__ +#define __AUFS_FILE_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include "rwsem.h" + +struct au_branch; +struct au_hfile { + struct file *hf_file; + struct au_branch *hf_br; +}; + +struct au_vdir; +struct au_finfo { + atomic_t fi_generation; + + struct rw_semaphore fi_rwsem; + struct au_hfile *fi_hfile; + aufs_bindex_t fi_bstart, fi_bend; + + union { + /* non-dir only */ + struct vm_operations_struct *fi_h_vm_ops; + + /* dir only */ + struct { + struct au_vdir *fi_vdir_cache; + int fi_maintain_plink; + }; + }; +}; + +/* ---------------------------------------------------------------------- */ + +/* file.c */ +extern struct address_space_operations aufs_aop; +void au_store_oflag(struct nameidata *nd, struct inode *inode); +unsigned int au_file_roflags(unsigned int flags); +struct file *au_h_open(struct dentry *dentry, aufs_bindex_t bindex, int flags, + struct file *file); +int au_do_open(struct file *file, int (*open)(struct file *file, int flags)); +int au_reopen_nondir(struct file *file); +struct au_pin; +int au_ready_to_write(struct file *file, loff_t len, struct au_pin *pin); +int au_reval_and_lock_fdi(struct file *file, int (*reopen)(struct file *file), + int wlock); + +/* f_op.c */ +extern const struct file_operations aufs_file_fop; +int aufs_flush(struct file *file, fl_owner_t id); + +/* finfo.c */ +void au_hfput(struct au_hfile *hf, struct file *file); +void au_set_h_fptr(struct file *file, aufs_bindex_t bindex, + struct file *h_file); + +void au_update_figen(struct file *file); + +void au_finfo_fin(struct file *file); +int au_finfo_init(struct file *file); +int au_fi_realloc(struct au_finfo *finfo, int nbr); + +/* ---------------------------------------------------------------------- */ + +static inline struct au_finfo *au_fi(struct file *file) +{ + return file->private_data; +} + +/* ---------------------------------------------------------------------- */ + +/* + * fi_read_lock, fi_write_lock, + * fi_read_unlock, fi_write_unlock, fi_downgrade_lock + */ +AuSimpleRwsemFuncs(fi, struct file *f, &au_fi(f)->fi_rwsem); + +#define FiMustNoWaiters(f) AuRwMustNoWaiters(&au_fi(f)->fi_rwsem) + +/* ---------------------------------------------------------------------- */ + +/* todo: hard/soft set? */ +static inline aufs_bindex_t au_fbstart(struct file *file) +{ + return au_fi(file)->fi_bstart; +} + +static inline aufs_bindex_t au_fbend(struct file *file) +{ + return au_fi(file)->fi_bend; +} + +static inline struct au_vdir *au_fvdir_cache(struct file *file) +{ + return au_fi(file)->fi_vdir_cache; +} + +static inline void au_set_fbstart(struct file *file, aufs_bindex_t bindex) +{ + au_fi(file)->fi_bstart = bindex; +} + +static inline void au_set_fbend(struct file *file, aufs_bindex_t bindex) +{ + au_fi(file)->fi_bend = bindex; +} + +static inline void au_set_fvdir_cache(struct file *file, + struct au_vdir *vdir_cache) +{ + au_fi(file)->fi_vdir_cache = vdir_cache; +} + +static inline struct file *au_h_fptr(struct file *file, aufs_bindex_t bindex) +{ + return au_fi(file)->fi_hfile[0 + bindex].hf_file; +} + +/* todo: memory barrier? */ +static inline unsigned int au_figen(struct file *f) +{ + return atomic_read(&au_fi(f)->fi_generation); +} + +static inline int au_test_mmapped(struct file *f) +{ + return !!(au_fi(f)->fi_h_vm_ops); +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_FILE_H__ */ diff -urN ./fs/aufs/finfo.c ./fs/aufs/finfo.c --- ./fs/aufs/finfo.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/finfo.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,124 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * file private data + */ + +#include "aufs.h" + +void au_hfput(struct au_hfile *hf, struct file *file) +{ + if (file->f_mode & FMODE_EXEC) + allow_write_access(hf->hf_file); + fput(hf->hf_file); + hf->hf_file = NULL; + atomic_dec(&hf->hf_br->br_count); + hf->hf_br = NULL; +} + +void au_set_h_fptr(struct file *file, aufs_bindex_t bindex, struct file *val) +{ + struct au_finfo *finfo = au_fi(file); + struct au_hfile *hf; + + hf = finfo->fi_hfile + bindex; + if (hf->hf_file) + au_hfput(hf, file); + if (val) { + hf->hf_file = val; + hf->hf_br = au_sbr(file->f_dentry->d_sb, bindex); + } +} + +void au_update_figen(struct file *file) +{ + atomic_set(&au_fi(file)->fi_generation, au_digen(file->f_dentry)); + /* smp_mb(); */ /* atomic_set */ +} + +/* ---------------------------------------------------------------------- */ + +void au_finfo_fin(struct file *file) +{ + struct au_finfo *finfo; + aufs_bindex_t bindex, bend; + + fi_write_lock(file); + bend = au_fbend(file); + bindex = au_fbstart(file); + if (bindex >= 0) + /* + * calls fput() instead of filp_close(), + * since no dnotify or lock for the lower file. + */ + for (; bindex <= bend; bindex++) + au_set_h_fptr(file, bindex, NULL); + + finfo = au_fi(file); + au_dbg_verify_hf(finfo); + kfree(finfo->fi_hfile); + fi_write_unlock(file); + au_rwsem_destroy(&finfo->fi_rwsem); + au_cache_free_finfo(finfo); +} + +int au_finfo_init(struct file *file) +{ + struct au_finfo *finfo; + struct dentry *dentry; + unsigned long ul; + + dentry = file->f_dentry; + finfo = au_cache_alloc_finfo(); + if (unlikely(!finfo)) + goto out; + + finfo->fi_hfile = kcalloc(au_sbend(dentry->d_sb) + 1, + sizeof(*finfo->fi_hfile), GFP_NOFS); + if (unlikely(!finfo->fi_hfile)) + goto out_finfo; + + init_rwsem(&finfo->fi_rwsem); + down_write(&finfo->fi_rwsem); + finfo->fi_bstart = -1; + finfo->fi_bend = -1; + atomic_set(&finfo->fi_generation, au_digen(dentry)); + /* smp_mb(); */ /* atomic_set */ + + /* cf. au_store_oflag() */ + /* suppress a warning in lp64 */ + ul = (unsigned long)file->private_data; + file->f_mode |= (vfsub_uint_to_fmode(ul) & FMODE_EXEC); + file->private_data = finfo; + return 0; /* success */ + + out_finfo: + au_cache_free_finfo(finfo); + out: + return -ENOMEM; +} + +int au_fi_realloc(struct au_finfo *finfo, int nbr) +{ + int err, sz; + struct au_hfile *hfp; + + err = -ENOMEM; + sz = sizeof(*hfp) * (finfo->fi_bend + 1); + if (!sz) + sz = sizeof(*hfp); + hfp = au_kzrealloc(finfo->fi_hfile, sz, sizeof(*hfp) * nbr, GFP_NOFS); + if (hfp) { + finfo->fi_hfile = hfp; + err = 0; + } + + return err; +} diff -urN ./fs/aufs/fstype.h ./fs/aufs/fstype.h --- ./fs/aufs/fstype.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/fstype.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,454 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * judging filesystem type + */ + +#ifndef __AUFS_FSTYPE_H__ +#define __AUFS_FSTYPE_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include + +static inline int au_test_aufs(struct super_block *sb) +{ + return sb->s_magic == AUFS_SUPER_MAGIC; +} + +static inline const char *au_sbtype(struct super_block *sb) +{ + return sb->s_type->name; +} + +static inline int au_test_iso9660(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_ROMFS_FS) || defined(CONFIG_ROMFS_FS_MODULE) + return sb->s_magic == ROMFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_romfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_ISO9660_FS) || defined(CONFIG_ISO9660_FS_MODULE) + return sb->s_magic == ISOFS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_cramfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_CRAMFS) || defined(CONFIG_CRAMFS_MODULE) + return sb->s_magic == CRAMFS_MAGIC; +#endif + return 0; +} + +static inline int au_test_nfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_NFS_FS) || defined(CONFIG_NFS_FS_MODULE) + return sb->s_magic == NFS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_fuse(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_FUSE_FS) || defined(CONFIG_FUSE_FS_MODULE) + return sb->s_magic == FUSE_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_xfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_XFS_FS) || defined(CONFIG_XFS_FS_MODULE) + return sb->s_magic == XFS_SB_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_tmpfs(struct super_block *sb __maybe_unused) +{ +#ifdef CONFIG_TMPFS + return sb->s_magic == TMPFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_ecryptfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_ECRYPT_FS) || defined(CONFIG_ECRYPT_FS_MODULE) + return !strcmp(au_sbtype(sb), "ecryptfs"); +#else + return 0; +#endif +} + +static inline int au_test_smbfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_SMB_FS) || defined(CONFIG_SMB_FS_MODULE) + return sb->s_magic == SMB_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_ocfs2(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_OCFS2_FS) || defined(CONFIG_OCFS2_FS_MODULE) + return sb->s_magic == OCFS2_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_ocfs2_dlmfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_OCFS2_FS_O2CB) || defined(CONFIG_OCFS2_FS_O2CB_MODULE) + return sb->s_magic == DLMFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_coda(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_CODA_FS) || defined(CONFIG_CODA_FS_MODULE) + return sb->s_magic == CODA_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_v9fs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_9P_FS) || defined(CONFIG_9P_FS_MODULE) + return sb->s_magic == V9FS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_ext4(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_EXT4DEV_FS) || defined(CONFIG_EXT4DEV_FS_MODULE) + return sb->s_magic == EXT4_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_sysv(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_SYSV_FS) || defined(CONFIG_SYSV_FS_MODULE) + return !strcmp(au_sbtype(sb), "sysv"); +#else + return 0; +#endif +} + +static inline int au_test_ramfs(struct super_block *sb) +{ + return sb->s_magic == RAMFS_MAGIC; +} + +static inline int au_test_ubifs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_UBIFS_FS) || defined(CONFIG_UBIFS_FS_MODULE) + return sb->s_magic == UBIFS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_procfs(struct super_block *sb __maybe_unused) +{ +#ifdef CONFIG_PROC_FS + return sb->s_magic == PROC_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_sysfs(struct super_block *sb __maybe_unused) +{ +#ifdef CONFIG_SYSFS + return sb->s_magic == SYSFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_configfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_CONFIGFS_FS) || defined(CONFIG_CONFIGFS_FS_MODULE) + return sb->s_magic == CONFIGFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_minix(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_MINIX_FS) || defined(CONFIG_MINIX_FS_MODULE) + return sb->s_magic == MINIX3_SUPER_MAGIC + || sb->s_magic == MINIX2_SUPER_MAGIC + || sb->s_magic == MINIX2_SUPER_MAGIC2 + || sb->s_magic == MINIX_SUPER_MAGIC + || sb->s_magic == MINIX_SUPER_MAGIC2; +#else + return 0; +#endif +} + +static inline int au_test_cifs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_CIFS_FS) || defined(CONFIGCIFS_FS_MODULE) + return sb->s_magic == CIFS_MAGIC_NUMBER; +#else + return 0; +#endif +} + +static inline int au_test_fat(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_FAT_FS) || defined(CONFIG_FAT_FS_MODULE) + return sb->s_magic == MSDOS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_msdos(struct super_block *sb) +{ + return au_test_fat(sb); +} + +static inline int au_test_vfat(struct super_block *sb) +{ + return au_test_fat(sb); +} + +static inline int au_test_securityfs(struct super_block *sb __maybe_unused) +{ +#ifdef CONFIG_SECURITYFS + return sb->s_magic == SECURITYFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_squashfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_SQUASHFS) || defined(CONFIG_SQUASHFS_MODULE) + return sb->s_magic == SQUASHFS_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_btrfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_BTRFS_FS) || defined(CONFIG_BTRFS_FS_MODULE) + return sb->s_magic == BTRFS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_xenfs(struct super_block *sb __maybe_unused) +{ +#if defined(CONFIG_XENFS) || defined(CONFIG_XENFS_MODULE) + return sb->s_magic == XENFS_SUPER_MAGIC; +#else + return 0; +#endif +} + +static inline int au_test_debugfs(struct super_block *sb __maybe_unused) +{ +#ifdef CONFIG_DEBUG_FS + return sb->s_magic == DEBUGFS_MAGIC; +#else + return 0; +#endif +} + +/* ---------------------------------------------------------------------- */ +/* + * they can't be an aufs branch. + */ +static inline int au_test_fs_unsuppoted(struct super_block *sb) +{ + return +#ifndef CONFIG_AUFS_BR_RAMFS + au_test_ramfs(sb) || +#endif + au_test_procfs(sb) + || au_test_sysfs(sb) + || au_test_configfs(sb) + || au_test_debugfs(sb) + || au_test_securityfs(sb) + || au_test_xenfs(sb) + /* || !strcmp(au_sbtype(sb), "unionfs") */ + || au_test_aufs(sb); /* will be supported in next version */ +} + +/* + * If the filesystem supports NFS-export, then it has to support NULL as + * a nameidata parameter for ->create(), ->lookup() and ->d_revalidate(). + * We can apply this principle when we handle a lower filesystem. + */ +static inline int au_test_fs_null_nd(struct super_block *sb) +{ + return !!sb->s_export_op; +} + +static inline int au_test_fs_remote(struct super_block *sb) +{ + return !au_test_tmpfs(sb) +#ifdef CONFIG_AUFS_BR_RAMFS + && !au_test_ramfs(sb) +#endif + && !(sb->s_type->fs_flags & FS_REQUIRES_DEV); +} + +/* ---------------------------------------------------------------------- */ + +/* + * Note: these functions (below) are created after reading ->getattr() in all + * filesystems under linux/fs. it means we have to do so in every update... + */ + +/* + * some filesystems require getattr to refresh the inode attributes before + * referencing. + * in most cases, we can rely on the inode attribute in NFS (or every remote fs) + * and leave the work for d_revalidate() + */ +static inline int au_test_fs_refresh_iattr(struct super_block *sb) +{ + return au_test_nfs(sb) + || au_test_fuse(sb) + /* || au_test_smbfs(sb) */ /* untested */ + /* || au_test_ocfs2(sb) */ /* untested */ + /* || au_test_btrfs(sb) */ /* untested */ + /* || au_test_coda(sb) */ /* untested */ + /* || au_test_v9fs(sb) */ /* untested */ + ; +} + +/* + * filesystems which don't maintain i_size or i_blocks. + */ +static inline int au_test_fs_bad_iattr_size(struct super_block *sb) +{ + return au_test_xfs(sb) + /* || au_test_ext4(sb) */ /* untested */ + /* || au_test_ocfs2(sb) */ /* untested */ + /* || au_test_ocfs2_dlmfs(sb) */ /* untested */ + /* || au_test_sysv(sb) */ /* untested */ + /* || au_test_ubifs(sb) */ /* untested */ + /* || au_test_minix(sb) */ /* untested */ + ; +} + +/* + * filesystems which don't store the correct value in some of their inode + * attributes. + */ +static inline int au_test_fs_bad_iattr(struct super_block *sb) +{ + return au_test_fs_bad_iattr_size(sb) + /* || au_test_cifs(sb) */ /* untested */ + || au_test_fat(sb) + || au_test_msdos(sb) + || au_test_vfat(sb); +} + +/* + * filesystems which sets S_NOATIME and S_NOCMTIME. + */ +static inline int au_test_fs_notime(struct super_block *sb) +{ + return au_test_nfs(sb) + || au_test_fuse(sb) + /* || au_test_cifs(sb) */ /* untested */ + /* || au_test_ubifs(sb) */ /* untested */ + ; +} + +/* + * filesystems which requires replacing i_mapping. + */ +static inline int au_test_fs_bad_mapping(struct super_block *sb) +{ + return au_test_fuse(sb); +} + +/* temporary support for i#1 in cramfs */ +static inline int au_test_fs_unique_ino(struct inode *inode) +{ + if (au_test_cramfs(inode->i_sb)) + return inode->i_ino != 1; + return 1; +} + +/* ---------------------------------------------------------------------- */ + +/* + * the filesystem where the xino files placed must support i/o after unlink and + * maintain i_size and i_blocks. + */ +static inline int au_test_fs_bad_xino(struct super_block *sb) +{ + return au_test_fs_remote(sb) + || au_test_fs_bad_iattr_size(sb) +#ifdef CONFIG_AUFS_BR_RAMFS + || !(au_test_ramfs(sb) || au_test_fs_null_nd(sb)) +#else + || !au_test_fs_null_nd(sb) /* to keep xino code simple */ +#endif + /* don't want unnecessary work for xino */ + || au_test_aufs(sb) + || au_test_ecryptfs(sb); +} + +static inline int au_test_fs_trunc_xino(struct super_block *sb) +{ + return au_test_tmpfs(sb) + || au_test_ramfs(sb); +} + +/* + * test if the @sb is real-readonly. + */ +static inline int au_test_fs_rr(struct super_block *sb) +{ + return au_test_squashfs(sb) + || au_test_iso9660(sb) + || au_test_cramfs(sb) + || au_test_romfs(sb); +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_FSTYPE_H__ */ diff -urN ./fs/aufs/hinotify.c ./fs/aufs/hinotify.c --- ./fs/aufs/hinotify.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/hinotify.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,746 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inotify for the lower directories + */ + +#include "aufs.h" + +static const __u32 AuHinMask = (IN_MOVE | IN_DELETE | IN_CREATE); +static struct inotify_handle *au_hin_handle; + +AuCacheFuncs(hinotify, HINOTIFY); + +int au_hin_alloc(struct au_hinode *hinode, struct inode *inode, + struct inode *h_inode) +{ + int err; + struct au_hinotify *hin; + s32 wd; + + err = -ENOMEM; + hin = au_cache_alloc_hinotify(); + if (hin) { + AuDebugOn(hinode->hi_notify); + hinode->hi_notify = hin; + hin->hin_aufs_inode = inode; + + inotify_init_watch(&hin->hin_watch); + wd = inotify_add_watch(au_hin_handle, &hin->hin_watch, h_inode, + AuHinMask); + if (wd >= 0) + return 0; /* success */ + + err = wd; + put_inotify_watch(&hin->hin_watch); + au_cache_free_hinotify(hin); + hinode->hi_notify = NULL; + } + + return err; +} + +void au_hin_free(struct au_hinode *hinode) +{ + int err; + struct au_hinotify *hin; + + hin = hinode->hi_notify; + if (hin) { + err = 0; + if (atomic_read(&hin->hin_watch.count)) + err = inotify_rm_watch(au_hin_handle, &hin->hin_watch); + if (unlikely(err)) + /* it means the watch is already removed */ + AuWarn("failed inotify_rm_watch() %d\n", err); + au_cache_free_hinotify(hin); + hinode->hi_notify = NULL; + } +} + +/* ---------------------------------------------------------------------- */ + +void au_hin_ctl(struct au_hinode *hinode, int do_set) +{ + struct inode *h_inode; + struct inotify_watch *watch; + + if (!hinode->hi_notify) + return; + + h_inode = hinode->hi_inode; + IMustLock(h_inode); + + /* todo: try inotify_find_update_watch()? */ + watch = &hinode->hi_notify->hin_watch; + mutex_lock(&h_inode->inotify_mutex); + /* mutex_lock(&watch->ih->mutex); */ + if (do_set) { + AuDebugOn(watch->mask & AuHinMask); + watch->mask |= AuHinMask; + } else { + AuDebugOn(!(watch->mask & AuHinMask)); + watch->mask &= ~AuHinMask; + } + /* mutex_unlock(&watch->ih->mutex); */ + mutex_unlock(&h_inode->inotify_mutex); +} + +void au_reset_hinotify(struct inode *inode, unsigned int flags) +{ + aufs_bindex_t bindex, bend; + struct inode *hi; + struct dentry *iwhdentry; + + bend = au_ibend(inode); + for (bindex = au_ibstart(inode); bindex <= bend; bindex++) { + hi = au_h_iptr(inode, bindex); + if (!hi) + continue; + + /* mutex_lock_nested(&hi->i_mutex, AuLsc_I_CHILD); */ + iwhdentry = au_hi_wh(inode, bindex); + if (iwhdentry) + dget(iwhdentry); + au_igrab(hi); + au_set_h_iptr(inode, bindex, NULL, 0); + au_set_h_iptr(inode, bindex, au_igrab(hi), + flags & ~AuHi_XINO); + iput(hi); + dput(iwhdentry); + /* mutex_unlock(&hi->i_mutex); */ + } +} + +/* ---------------------------------------------------------------------- */ + +static int hin_xino(struct inode *inode, struct inode *h_inode) +{ + int err; + aufs_bindex_t bindex, bend, bfound, bstart; + struct inode *h_i; + + err = 0; + if (unlikely(inode->i_ino == AUFS_ROOT_INO)) { + AuWarn("branch root dir was changed\n"); + goto out; + } + + bfound = -1; + bend = au_ibend(inode); + bstart = au_ibstart(inode); +#if 0 /* reserved for future use */ + if (bindex == bend) { + /* keep this ino in rename case */ + goto out; + } +#endif + for (bindex = bstart; bindex <= bend; bindex++) { + if (au_h_iptr(inode, bindex) == h_inode) { + bfound = bindex; + break; + } + } + if (bfound < 0) + goto out; + + for (bindex = bstart; bindex <= bend; bindex++) { + h_i = au_h_iptr(inode, bindex); + if (!h_i) + continue; + + err = au_xino_write0(inode->i_sb, bindex, h_i->i_ino, 0); + /* ignore this error */ + /* bad action? */ + } + + /* children inode number will be broken */ + + out: + AuTraceErr(err); + return err; +} + +static int hin_gen_tree(struct dentry *dentry) +{ + int err, i, j, ndentry; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + struct dentry **dentries; + + err = au_dpages_init(&dpages, GFP_NOFS); + if (unlikely(err)) + goto out; + err = au_dcsub_pages(&dpages, dentry, NULL, NULL); + if (unlikely(err)) + goto out_dpages; + + for (i = 0; i < dpages.ndpage; i++) { + dpage = dpages.dpages + i; + dentries = dpage->dentries; + ndentry = dpage->ndentry; + for (j = 0; j < ndentry; j++) { + struct dentry *d; + + d = dentries[j]; + if (IS_ROOT(d)) + continue; + + d_drop(d); + au_digen_dec(d); + if (d->d_inode) + /* todo: reset children xino? + cached children only? */ + au_iigen_dec(d->d_inode); + } + } + + out_dpages: + au_dpages_free(&dpages); + + /* discard children */ + dentry_unhash(dentry); + dput(dentry); + out: + return err; +} + +/* + * return 0 if processed. + */ +static int hin_gen_by_inode(char *name, unsigned int nlen, struct inode *inode, + const unsigned int isdir) +{ + int err; + struct dentry *d; + struct qstr *dname; + + err = 1; + if (unlikely(inode->i_ino == AUFS_ROOT_INO)) { + AuWarn("branch root dir was changed\n"); + err = 0; + goto out; + } + + if (!isdir) { + AuDebugOn(!name); + au_iigen_dec(inode); + spin_lock(&dcache_lock); + list_for_each_entry(d, &inode->i_dentry, d_alias) { + dname = &d->d_name; + if (dname->len != nlen + && memcmp(dname->name, name, nlen)) + continue; + err = 0; + spin_lock(&d->d_lock); + __d_drop(d); + au_digen_dec(d); + spin_unlock(&d->d_lock); + break; + } + spin_unlock(&dcache_lock); + } else { + au_fset_si(au_sbi(inode->i_sb), FAILED_REFRESH_DIRS); + d = d_find_alias(inode); + if (!d) { + au_iigen_dec(inode); + goto out; + } + + dname = &d->d_name; + if (dname->len == nlen && !memcmp(dname->name, name, nlen)) + err = hin_gen_tree(d); + dput(d); + } + + out: + AuTraceErr(err); + return err; +} + +static int hin_gen_by_name(struct dentry *dentry, const unsigned int isdir) +{ + int err; + struct inode *inode; + + inode = dentry->d_inode; + if (IS_ROOT(dentry) + /* || (inode && inode->i_ino == AUFS_ROOT_INO) */ + ) { + AuWarn("branch root dir was changed\n"); + return 0; + } + + err = 0; + if (!isdir) { + d_drop(dentry); + au_digen_dec(dentry); + if (inode) + au_iigen_dec(inode); + } else { + au_fset_si(au_sbi(dentry->d_sb), FAILED_REFRESH_DIRS); + if (inode) + err = hin_gen_tree(dentry); + } + + AuTraceErr(err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* hinotify job flags */ +#define AuHinJob_XINO0 1 +#define AuHinJob_GEN (1 << 1) +#define AuHinJob_DIRENT (1 << 2) +#define AuHinJob_ISDIR (1 << 3) +#define AuHinJob_TRYXINO0 (1 << 4) +#define AuHinJob_MNTPNT (1 << 5) +#define au_ftest_hinjob(flags, name) ((flags) & AuHinJob_##name) +#define au_fset_hinjob(flags, name) { (flags) |= AuHinJob_##name; } +#define au_fclr_hinjob(flags, name) { (flags) &= ~AuHinJob_##name; } + +struct hin_job_args { + unsigned int flags; + struct inode *inode, *h_inode, *dir, *h_dir; + struct dentry *dentry; + char *h_name; + int h_nlen; +}; + +static int hin_job(struct hin_job_args *a) +{ + const unsigned int isdir = au_ftest_hinjob(a->flags, ISDIR); + + /* reset xino */ + if (au_ftest_hinjob(a->flags, XINO0) && a->inode) + hin_xino(a->inode, a->h_inode); /* ignore this error */ + + if (au_ftest_hinjob(a->flags, TRYXINO0) + && a->inode + && a->h_inode) { + mutex_lock_nested(&a->h_inode->i_mutex, AuLsc_I_CHILD); + if (!a->h_inode->i_nlink) + hin_xino(a->inode, a->h_inode); /* ignore this error */ + mutex_unlock(&a->h_inode->i_mutex); + } + + /* make the generation obsolete */ + if (au_ftest_hinjob(a->flags, GEN)) { + int err = -1; + if (a->inode) + err = hin_gen_by_inode(a->h_name, a->h_nlen, a->inode, + isdir); + if (err && a->dentry) + hin_gen_by_name(a->dentry, isdir); + /* ignore this error */ + } + + /* make dir entries obsolete */ + if (au_ftest_hinjob(a->flags, DIRENT) && a->inode) { + struct au_vdir *vdir; + + vdir = au_ivdir(a->inode); + if (vdir) + vdir->vd_jiffy = 0; + /* IMustLock(a->inode); */ + /* a->inode->i_version++; */ + } + + /* can do nothing but warn */ + if (au_ftest_hinjob(a->flags, MNTPNT) + && a->dentry + && d_mountpoint(a->dentry)) + AuWarn("mount-point %.*s is removed or renamed\n", + AuDLNPair(a->dentry)); + + return 0; +} + +/* ---------------------------------------------------------------------- */ + +static char *in_name(u32 mask) +{ +#ifdef CONFIG_AUFS_DEBUG +#define test_ret(flag) if (mask & flag) \ + return #flag; + test_ret(IN_ACCESS); + test_ret(IN_MODIFY); + test_ret(IN_ATTRIB); + test_ret(IN_CLOSE_WRITE); + test_ret(IN_CLOSE_NOWRITE); + test_ret(IN_OPEN); + test_ret(IN_MOVED_FROM); + test_ret(IN_MOVED_TO); + test_ret(IN_CREATE); + test_ret(IN_DELETE); + test_ret(IN_DELETE_SELF); + test_ret(IN_MOVE_SELF); + test_ret(IN_UNMOUNT); + test_ret(IN_Q_OVERFLOW); + test_ret(IN_IGNORED); + return ""; +#undef test_ret +#else + return "??"; +#endif +} + +static struct dentry *lookup_wlock_by_name(char *name, unsigned int nlen, + struct inode *dir) +{ + struct dentry *dentry, *d, *parent; + struct qstr *dname; + + parent = d_find_alias(dir); + if (!parent) + return NULL; + + dentry = NULL; + spin_lock(&dcache_lock); + list_for_each_entry(d, &parent->d_subdirs, d_u.d_child) { + /* AuDbg("%.*s\n", AuDLNPair(d)); */ + dname = &d->d_name; + if (dname->len != nlen || memcmp(dname->name, name, nlen)) + continue; + if (!atomic_read(&d->d_count) || !d->d_fsdata) { + spin_lock(&d->d_lock); + __d_drop(d); + spin_unlock(&d->d_lock); + continue; + } + + dentry = dget(d); + break; + } + spin_unlock(&dcache_lock); + dput(parent); + + if (dentry) + di_write_lock_child(dentry); + + return dentry; +} + +static struct inode *lookup_wlock_by_ino(struct super_block *sb, + aufs_bindex_t bindex, ino_t h_ino) +{ + struct inode *inode; + ino_t ino; + int err; + + inode = NULL; + err = au_xino_read(sb, bindex, h_ino, &ino); + if (!err && ino) + inode = ilookup(sb, ino); + if (!inode) + goto out; + + if (unlikely(inode->i_ino == AUFS_ROOT_INO)) { + AuWarn("wrong root branch\n"); + iput(inode); + inode = NULL; + goto out; + } + + ii_write_lock_child(inode); + + out: + return inode; +} + +enum { CHILD, PARENT }; +struct postproc_args { + struct inode *h_dir, *dir, *h_child_inode; + u32 mask; + unsigned int flags[2]; + unsigned int h_child_nlen; + char h_child_name[]; +}; + +static void postproc(void *_args) +{ + struct postproc_args *a = _args; + struct super_block *sb; + aufs_bindex_t bindex, bend, bfound; + unsigned char xino, try_iput; + int err; + struct inode *inode; + ino_t h_ino; + struct hin_job_args args; + struct dentry *dentry; + struct au_sbinfo *sbinfo; + + AuDebugOn(!_args); + AuDebugOn(!a->h_dir); + AuDebugOn(!a->dir); + AuDebugOn(!a->mask); + AuDbg("mask 0x%x %s, i%lu, hi%lu, hci%lu\n", + a->mask, in_name(a->mask), a->dir->i_ino, a->h_dir->i_ino, + a->h_child_inode ? a->h_child_inode->i_ino : 0); + + inode = NULL; + dentry = NULL; + /* + * do not lock a->dir->i_mutex here + * because of d_revalidate() may cause a deadlock. + */ + sb = a->dir->i_sb; + AuDebugOn(!sb); + sbinfo = au_sbi(sb); + AuDebugOn(!sbinfo); + /* big aufs lock */ + si_noflush_write_lock(sb); + + ii_read_lock_parent(a->dir); + bfound = -1; + bend = au_ibend(a->dir); + for (bindex = au_ibstart(a->dir); bindex <= bend; bindex++) + if (au_h_iptr(a->dir, bindex) == a->h_dir) { + bfound = bindex; + break; + } + ii_read_unlock(a->dir); + if (unlikely(bfound < 0)) + goto out; + + xino = !!au_opt_test(au_mntflags(sb), XINO); + h_ino = 0; + if (a->h_child_inode) + h_ino = a->h_child_inode->i_ino; + + if (a->h_child_nlen + && (au_ftest_hinjob(a->flags[CHILD], GEN) + || au_ftest_hinjob(a->flags[CHILD], MNTPNT))) + dentry = lookup_wlock_by_name(a->h_child_name, a->h_child_nlen, + a->dir); + try_iput = 0; + if (dentry) + inode = dentry->d_inode; + if (xino && !inode && h_ino + && (au_ftest_hinjob(a->flags[CHILD], XINO0) + || au_ftest_hinjob(a->flags[CHILD], TRYXINO0) + || au_ftest_hinjob(a->flags[CHILD], GEN))) { + inode = lookup_wlock_by_ino(sb, bfound, h_ino); + try_iput = 1; + } + + args.flags = a->flags[CHILD]; + args.dentry = dentry; + args.inode = inode; + args.h_inode = a->h_child_inode; + args.dir = a->dir; + args.h_dir = a->h_dir; + args.h_name = a->h_child_name; + args.h_nlen = a->h_child_nlen; + err = hin_job(&args); + if (dentry) { + if (dentry->d_fsdata) + di_write_unlock(dentry); + dput(dentry); + } + if (inode && try_iput) { + ii_write_unlock(inode); + iput(inode); + } + + ii_write_lock_parent(a->dir); + args.flags = a->flags[PARENT]; + args.dentry = NULL; + args.inode = a->dir; + args.h_inode = a->h_dir; + args.dir = NULL; + args.h_dir = NULL; + args.h_name = NULL; + args.h_nlen = 0; + err = hin_job(&args); + ii_write_unlock(a->dir); + + out: + au_nwt_done(&sbinfo->si_nowait); + si_write_unlock(sb); + + iput(a->h_child_inode); + iput(a->h_dir); + iput(a->dir); + kfree(a); +} + +/* ---------------------------------------------------------------------- */ + +static void aufs_inotify(struct inotify_watch *watch, u32 wd __maybe_unused, + u32 mask, u32 cookie __maybe_unused, + const char *h_child_name, struct inode *h_child_inode) +{ + struct au_hinotify *hinotify; + struct postproc_args *args; + int len, wkq_err; + unsigned char isdir, isroot, wh; + char *p; + struct inode *dir; + unsigned int flags[2]; + + /* if IN_UNMOUNT happens, there must be another bug */ + AuDebugOn(mask & IN_UNMOUNT); + if (mask & (IN_IGNORED | IN_UNMOUNT)) { + put_inotify_watch(watch); + return; + } +#ifdef AuDbgHinotify + au_debug(1); + if (1 || !h_child_name || strcmp(h_child_name, AUFS_XINO_FNAME)) { + AuDbg("i%lu, wd %d, mask 0x%x %s, cookie 0x%x, hcname %s," + " hi%lu\n", + watch->inode->i_ino, wd, mask, in_name(mask), cookie, + h_child_name ? h_child_name : "", + h_child_inode ? h_child_inode->i_ino : 0); + WARN_ON(1); + } + au_debug(0); +#endif + + hinotify = container_of(watch, struct au_hinotify, hin_watch); + AuDebugOn(!hinotify || !hinotify->hin_aufs_inode); + dir = igrab(hinotify->hin_aufs_inode); + if (!dir) + return; + + isroot = (dir->i_ino == AUFS_ROOT_INO); + len = 0; + wh = 0; + if (h_child_name) { + len = strlen(h_child_name); + if (!memcmp(h_child_name, AUFS_WH_PFX, AUFS_WH_PFX_LEN)) { + h_child_name += AUFS_WH_PFX_LEN; + len -= AUFS_WH_PFX_LEN; + wh = 1; + } + } + + isdir = 0; + if (h_child_inode) + isdir = !!S_ISDIR(h_child_inode->i_mode); + flags[PARENT] = AuHinJob_ISDIR; + flags[CHILD] = 0; + if (isdir) + flags[CHILD] = AuHinJob_ISDIR; + switch (mask & IN_ALL_EVENTS) { + case IN_MOVED_FROM: + case IN_MOVED_TO: + AuDebugOn(!h_child_name || !h_child_inode); + au_fset_hinjob(flags[CHILD], GEN); + au_fset_hinjob(flags[CHILD], XINO0); + au_fset_hinjob(flags[CHILD], MNTPNT); + au_fset_hinjob(flags[PARENT], DIRENT); + break; + + case IN_CREATE: + AuDebugOn(!h_child_name || !h_child_inode); + au_fset_hinjob(flags[PARENT], DIRENT); + au_fset_hinjob(flags[CHILD], GEN); + break; + + case IN_DELETE: + /* + * aufs never be able to get this child inode. + * revalidation should be in d_revalidate() + * by checking i_nlink, i_generation or d_unhashed(). + */ + AuDebugOn(!h_child_name); + au_fset_hinjob(flags[PARENT], DIRENT); + au_fset_hinjob(flags[CHILD], GEN); + au_fset_hinjob(flags[CHILD], TRYXINO0); + au_fset_hinjob(flags[CHILD], MNTPNT); + break; + + default: + AuDebugOn(1); + } + + if (wh) + h_child_inode = NULL; + + /* iput() and kfree() will be called in postproc() */ + /* + * inotify_mutex is already acquired and kmalloc/prune_icache may lock + * iprune_mutex. strange. + */ + lockdep_off(); + args = kmalloc(sizeof(*args) + len + 1, GFP_NOFS); + lockdep_on(); + if (unlikely(!args)) { + AuErr1("no memory\n"); + iput(dir); + return; + } + args->flags[PARENT] = flags[PARENT]; + args->flags[CHILD] = flags[CHILD]; + args->mask = mask; + args->dir = dir; + args->h_dir = igrab(watch->inode); + if (h_child_inode) + h_child_inode = igrab(h_child_inode); /* can be NULL */ + args->h_child_inode = h_child_inode; + args->h_child_nlen = len; + if (len) { + p = (void *)args; + p += sizeof(*args); + memcpy(p, h_child_name, len + 1); + } + + lockdep_off(); + wkq_err = au_wkq_nowait(postproc, args, dir->i_sb); + lockdep_on(); + if (unlikely(wkq_err)) + AuErr("wkq %d\n", wkq_err); +} + +static void aufs_inotify_destroy(struct inotify_watch *watch __maybe_unused) +{ + return; +} + +static struct inotify_operations aufs_inotify_ops = { + .handle_event = aufs_inotify, + .destroy_watch = aufs_inotify_destroy +}; + +/* ---------------------------------------------------------------------- */ + +static void au_hin_destroy_cache(void) +{ + kmem_cache_destroy(au_cachep[AuCache_HINOTIFY]); + au_cachep[AuCache_HINOTIFY] = NULL; +} + +int __init au_hinotify_init(void) +{ + int err; + + err = -ENOMEM; + au_cachep[AuCache_HINOTIFY] = AuCache(au_hinotify); + if (au_cachep[AuCache_HINOTIFY]) { + err = 0; + au_hin_handle = inotify_init(&aufs_inotify_ops); + if (IS_ERR(au_hin_handle)) { + err = PTR_ERR(au_hin_handle); + au_hin_destroy_cache(); + } + } + AuTraceErr(err); + return err; +} + +void au_hinotify_fin(void) +{ + inotify_destroy(au_hin_handle); + if (au_cachep[AuCache_HINOTIFY]) + au_hin_destroy_cache(); +} diff -urN ./fs/aufs/i_op.c ./fs/aufs/i_op.c --- ./fs/aufs/i_op.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/i_op.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,862 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode operations (except add/del/rename) + */ + +#include +#include +#include +#include "aufs.h" + +static int h_permission(struct inode *h_inode, int mask, + struct vfsmount *h_mnt, int brperm) +{ + int err; + const unsigned char write_mask = !!(mask & (MAY_WRITE | MAY_APPEND)); + + err = -EACCES; + if ((write_mask && IS_IMMUTABLE(h_inode)) + || ((mask & MAY_EXEC) + && S_ISREG(h_inode->i_mode) + && ((h_mnt->mnt_flags & MNT_NOEXEC) + || !(h_inode->i_mode & S_IXUGO)))) + goto out; + + /* + * - skip the lower fs test in the case of write to ro branch. + * - nfs dir permission write check is optimized, but a policy for + * link/rename requires a real check. + */ + if ((write_mask && !au_br_writable(brperm)) + || (au_test_nfs(h_inode->i_sb) && S_ISDIR(h_inode->i_mode) + && write_mask && !(mask & MAY_READ)) + || !h_inode->i_op->permission) { + /* AuLabel(generic_permission); */ + err = generic_permission(h_inode, mask, NULL); + } else { + /* AuLabel(h_inode->permission); */ + err = h_inode->i_op->permission(h_inode, mask); + AuTraceErr(err); + } + + if (!err) + err = devcgroup_inode_permission(h_inode, mask); + if (!err) + err = security_inode_permission + (h_inode, mask & (MAY_READ | MAY_WRITE | MAY_EXEC + | MAY_APPEND)); + + out: + return err; +} + +static int aufs_permission(struct inode *inode, int mask) +{ + int err; + aufs_bindex_t bindex, bend; + const unsigned char isdir = !!S_ISDIR(inode->i_mode); + const unsigned char write_mask = !!(mask & (MAY_WRITE | MAY_APPEND)); + struct inode *h_inode; + struct super_block *sb; + struct au_branch *br; + + sb = inode->i_sb; + si_read_lock(sb, AuLock_FLUSH); + ii_read_lock_child(inode); + + if (!isdir || write_mask) { + h_inode = au_h_iptr(inode, au_ibstart(inode)); + AuDebugOn(!h_inode + || ((h_inode->i_mode & S_IFMT) + != (inode->i_mode & S_IFMT))); + err = 0; + bindex = au_ibstart(inode); + br = au_sbr(sb, bindex); + err = h_permission(h_inode, mask, br->br_mnt, br->br_perm); + + if (write_mask && !err) { + /* test whether the upper writable branch exists */ + err = -EROFS; + for (; bindex >= 0; bindex--) + if (!au_br_rdonly(au_sbr(sb, bindex))) { + err = 0; + break; + } + } + goto out; + } + + /* non-write to dir */ + err = 0; + bend = au_ibend(inode); + for (bindex = au_ibstart(inode); !err && bindex <= bend; bindex++) { + h_inode = au_h_iptr(inode, bindex); + if (h_inode) { + AuDebugOn(!S_ISDIR(h_inode->i_mode)); + br = au_sbr(sb, bindex); + err = h_permission(h_inode, mask, br->br_mnt, + br->br_perm); + } + } + + out: + ii_read_unlock(inode); + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static struct dentry *aufs_lookup(struct inode *dir, struct dentry *dentry, + struct nameidata *nd) +{ + struct dentry *ret, *parent; + struct inode *inode, *h_inode; + struct mutex *mtx; + struct super_block *sb; + int err, npositive; + aufs_bindex_t bstart; + + /* temporary workaround for a bug in NFSD readdir */ + if (!au_test_nfsd(current)) + IMustLock(dir); + else + WARN_ONCE(!mutex_is_locked(&dir->i_mutex), + "a known problem of NFSD readdir since 2.6.28\n"); + + sb = dir->i_sb; + si_read_lock(sb, AuLock_FLUSH); + err = au_alloc_dinfo(dentry); + ret = ERR_PTR(err); + if (unlikely(err)) + goto out; + + parent = dentry->d_parent; /* dir inode is locked */ + di_read_lock_parent(parent, AuLock_IR); + npositive = au_lkup_dentry(dentry, au_dbstart(parent), /*type*/0, nd); + di_read_unlock(parent, AuLock_IR); + err = npositive; + ret = ERR_PTR(err); + if (unlikely(err < 0)) + goto out_unlock; + + inode = NULL; + if (npositive) { + bstart = au_dbstart(dentry); + h_inode = au_h_dptr(dentry, bstart)->d_inode; + if (!S_ISDIR(h_inode->i_mode)) { + /* + * stop 'race'-ing between hardlinks under different + * parents. + */ + mtx = &au_sbr(sb, bstart)->br_xino.xi_nondir_mtx; + mutex_lock(mtx); + inode = au_new_inode(dentry, /*must_new*/0); + mutex_unlock(mtx); + } else + inode = au_new_inode(dentry, /*must_new*/0); + ret = (void *)inode; + } + if (IS_ERR(inode)) + goto out_unlock; + + ret = d_splice_alias(inode, dentry); + if (unlikely(IS_ERR(ret) && inode)) + ii_write_unlock(inode); + au_store_oflag(nd, inode); + + out_unlock: + di_write_unlock(dentry); + out: + si_read_unlock(sb); + return ret; +} + +/* ---------------------------------------------------------------------- */ + +static int au_wr_dir_cpup(struct dentry *dentry, struct dentry *parent, + const unsigned char add_entry, aufs_bindex_t bcpup, + aufs_bindex_t bstart) +{ + int err; + struct dentry *h_parent; + struct inode *h_dir; + + if (add_entry) { + au_update_dbstart(dentry); + IMustLock(parent->d_inode); + } else + di_write_lock_parent(parent); + + err = 0; + if (!au_h_dptr(parent, bcpup)) { + if (bstart < bcpup) + err = au_cpdown_dirs(dentry, bcpup); + else + err = au_cpup_dirs(dentry, bcpup); + } + if (!err && add_entry) { + h_parent = au_h_dptr(parent, bcpup); + h_dir = h_parent->d_inode; + mutex_lock_nested(&h_dir->i_mutex, AuLsc_I_PARENT); + err = au_lkup_neg(dentry, bcpup); + /* todo: no unlock here */ + mutex_unlock(&h_dir->i_mutex); + if (bstart < bcpup && au_dbstart(dentry) < 0) { + au_set_dbstart(dentry, 0); + au_update_dbrange(dentry, /*do_put_zero*/0); + } + } + + if (!add_entry) + di_write_unlock(parent); + if (!err) + err = bcpup; /* success */ + + return err; +} + +/* + * decide the branch and the parent dir where we will create a new entry. + * returns new bindex or an error. + * copyup the parent dir if needed. + */ +int au_wr_dir(struct dentry *dentry, struct dentry *src_dentry, + struct au_wr_dir_args *args) +{ + int err; + aufs_bindex_t bcpup, bstart, src_bstart; + const unsigned char add_entry = !!au_ftest_wrdir(args->flags, + ADD_ENTRY); + struct super_block *sb; + struct dentry *parent; + struct au_sbinfo *sbinfo; + + sb = dentry->d_sb; + sbinfo = au_sbi(sb); + parent = dget_parent(dentry); + bstart = au_dbstart(dentry); + bcpup = bstart; + if (args->force_btgt < 0) { + if (src_dentry) { + src_bstart = au_dbstart(src_dentry); + if (src_bstart < bstart) + bcpup = src_bstart; + } else if (add_entry) { + err = AuWbrCreate(sbinfo, dentry, + au_ftest_wrdir(args->flags, ISDIR)); + bcpup = err; + } + + if (bcpup < 0 || au_test_ro(sb, bcpup, dentry->d_inode)) { + if (add_entry) + err = AuWbrCopyup(sbinfo, dentry); + else { + if (!IS_ROOT(dentry)) { + di_read_lock_parent(parent, !AuLock_IR); + err = AuWbrCopyup(sbinfo, dentry); + di_read_unlock(parent, !AuLock_IR); + } else + err = AuWbrCopyup(sbinfo, dentry); + } + bcpup = err; + if (unlikely(err < 0)) + goto out; + } + } else { + bcpup = args->force_btgt; + AuDebugOn(au_test_ro(sb, bcpup, dentry->d_inode)); + } + AuDbg("bstart %d, bcpup %d\n", bstart, bcpup); + if (bstart < bcpup) + au_update_dbrange(dentry, /*do_put_zero*/1); + + err = bcpup; + if (bcpup == bstart) + goto out; /* success */ + + /* copyup the new parent into the branch we process */ + err = au_wr_dir_cpup(dentry, parent, add_entry, bcpup, bstart); + + out: + dput(parent); + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct dentry *au_pinned_h_parent(struct au_pin *pin) +{ + if (pin && pin->parent) + return au_h_dptr(pin->parent, pin->bindex); + return NULL; +} + +void au_unpin(struct au_pin *p) +{ + if (au_ftest_pin(p->flags, MNT_WRITE)) + mnt_drop_write(p->h_mnt); + if (!p->hdir) + return; + + au_hin_imtx_unlock(p->hdir); + if (!au_ftest_pin(p->flags, DI_LOCKED)) + di_read_unlock(p->parent, AuLock_IR); + iput(p->hdir->hi_inode); + dput(p->parent); + p->parent = NULL; + p->hdir = NULL; + p->h_mnt = NULL; +} + +int au_do_pin(struct au_pin *p) +{ + int err; + struct super_block *sb; + struct dentry *h_dentry, *h_parent; + struct au_branch *br; + struct inode *h_dir; + + err = 0; + sb = p->dentry->d_sb; + br = au_sbr(sb, p->bindex); + if (IS_ROOT(p->dentry)) { + if (au_ftest_pin(p->flags, MNT_WRITE)) { + p->h_mnt = br->br_mnt; + err = mnt_want_write(p->h_mnt); + if (unlikely(err)) { + au_fclr_pin(p->flags, MNT_WRITE); + goto out_err; + } + } + goto out; + } + + h_dentry = NULL; + if (p->bindex <= au_dbend(p->dentry)) + h_dentry = au_h_dptr(p->dentry, p->bindex); + + p->parent = dget_parent(p->dentry); + if (!au_ftest_pin(p->flags, DI_LOCKED)) + di_read_lock(p->parent, AuLock_IR, p->lsc_di); + + h_dir = NULL; + h_parent = au_h_dptr(p->parent, p->bindex); + p->hdir = au_hi(p->parent->d_inode, p->bindex); + if (p->hdir) + h_dir = p->hdir->hi_inode; + + /* udba case */ + if (unlikely(!p->hdir || !h_dir)) { + err = au_busy_or_stale(); + if (!au_ftest_pin(p->flags, DI_LOCKED)) + di_read_unlock(p->parent, AuLock_IR); + dput(p->parent); + p->parent = NULL; + goto out_err; + } + + au_igrab(h_dir); + au_hin_imtx_lock_nested(p->hdir, p->lsc_hi); + + if (h_dentry) { + err = au_h_verify(h_dentry, p->udba, h_dir, h_parent, br); + if (unlikely(err)) { + au_fclr_pin(p->flags, MNT_WRITE); + goto out_unpin; + } + } + + if (au_ftest_pin(p->flags, MNT_WRITE)) { + p->h_mnt = br->br_mnt; + err = mnt_want_write(p->h_mnt); + if (unlikely(err)) { + au_fclr_pin(p->flags, MNT_WRITE); + goto out_unpin; + } + } + goto out; /* success */ + + out_unpin: + au_unpin(p); + out_err: + AuErr("err %d\n", err); + err = au_busy_or_stale(); + out: + return err; +} + +void au_pin_init(struct au_pin *p, struct dentry *dentry, + aufs_bindex_t bindex, int lsc_di, int lsc_hi, + unsigned int udba, unsigned char flags) +{ + p->dentry = dentry; + p->udba = udba; + p->lsc_di = lsc_di; + p->lsc_hi = lsc_hi; + p->flags = flags; + p->bindex = bindex; + + p->parent = NULL; + p->hdir = NULL; + p->h_mnt = NULL; +} + +int au_pin(struct au_pin *pin, struct dentry *dentry, aufs_bindex_t bindex, + unsigned int udba, unsigned char flags) +{ + au_pin_init(pin, dentry, bindex, AuLsc_DI_PARENT, AuLsc_I_PARENT2, + udba, flags); + return au_do_pin(pin); +} + +/* ---------------------------------------------------------------------- */ + +#define AuIcpup_DID_CPUP 1 +#define au_ftest_icpup(flags, name) ((flags) & AuIcpup_##name) +#define au_fset_icpup(flags, name) { (flags) |= AuIcpup_##name; } +#define au_fclr_icpup(flags, name) { (flags) &= ~AuIcpup_##name; } + +struct au_icpup_args { + unsigned char flags; + unsigned char pin_flags; + aufs_bindex_t btgt; + struct au_pin pin; + struct path h_path; + struct inode *h_inode; +}; + +static int au_lock_and_icpup(struct dentry *dentry, struct iattr *ia, + struct au_icpup_args *a) +{ + int err; + unsigned int udba; + loff_t sz; + aufs_bindex_t bstart; + struct dentry *hi_wh, *parent; + struct inode *inode; + struct au_wr_dir_args wr_dir_args = { + .force_btgt = -1, + .flags = 0 + }; + + di_write_lock_child(dentry); + bstart = au_dbstart(dentry); + inode = dentry->d_inode; + if (S_ISDIR(inode->i_mode)) + au_fset_wrdir(wr_dir_args.flags, ISDIR); + /* plink or hi_wh() case */ + if (bstart != au_ibstart(inode)) + wr_dir_args.force_btgt = au_ibstart(inode); + err = au_wr_dir(dentry, /*src_dentry*/NULL, &wr_dir_args); + if (unlikely(err < 0)) + goto out_dentry; + a->btgt = err; + if (err != bstart) + au_fset_icpup(a->flags, DID_CPUP); + + err = 0; + a->pin_flags = AuPin_MNT_WRITE; + parent = NULL; + if (!IS_ROOT(dentry)) { + au_fset_pin(a->pin_flags, DI_LOCKED); + parent = dget_parent(dentry); + di_write_lock_parent(parent); + } + + udba = au_opt_udba(dentry->d_sb); + if (d_unhashed(dentry) || (ia->ia_valid & ATTR_FILE)) + udba = AuOpt_UDBA_NONE; + err = au_pin(&a->pin, dentry, a->btgt, udba, a->pin_flags); + if (unlikely(err)) { + if (parent) { + di_write_unlock(parent); + dput(parent); + } + goto out_dentry; + } + a->h_path.dentry = au_h_dptr(dentry, bstart); + a->h_inode = a->h_path.dentry->d_inode; + mutex_lock_nested(&a->h_inode->i_mutex, AuLsc_I_CHILD); + sz = -1; + if ((ia->ia_valid & ATTR_SIZE) && ia->ia_size < i_size_read(a->h_inode)) + sz = ia->ia_size; + + hi_wh = NULL; + if (au_ftest_icpup(a->flags, DID_CPUP) && d_unhashed(dentry)) { + hi_wh = au_hi_wh(inode, a->btgt); + if (!hi_wh) { + err = au_sio_cpup_wh(dentry, a->btgt, sz, /*file*/NULL); + if (unlikely(err)) + goto out_unlock; + hi_wh = au_hi_wh(inode, a->btgt); + /* todo: revalidate hi_wh? */ + } + } + + if (parent) { + au_pin_set_parent_lflag(&a->pin, /*lflag*/0); + di_downgrade_lock(parent, AuLock_IR); + dput(parent); + } + if (!au_ftest_icpup(a->flags, DID_CPUP)) + goto out; /* success */ + + if (!d_unhashed(dentry)) { + err = au_sio_cpup_simple(dentry, a->btgt, sz, AuCpup_DTIME); + if (!err) + a->h_path.dentry = au_h_dptr(dentry, a->btgt); + } else if (!hi_wh) + a->h_path.dentry = au_h_dptr(dentry, a->btgt); + else + a->h_path.dentry = hi_wh; /* do not dget here */ + + out_unlock: + mutex_unlock(&a->h_inode->i_mutex); + a->h_inode = a->h_path.dentry->d_inode; + if (!err) { + mutex_lock_nested(&a->h_inode->i_mutex, AuLsc_I_CHILD); + goto out; /* success */ + } + + au_unpin(&a->pin); + + out_dentry: + di_write_unlock(dentry); + out: + return err; +} + +static int aufs_setattr(struct dentry *dentry, struct iattr *ia) +{ + int err; + struct inode *inode; + struct super_block *sb; + struct file *file; + struct au_icpup_args *a; + + err = -ENOMEM; + a = kzalloc(sizeof(*a), GFP_NOFS); + if (unlikely(!a)) + goto out; + + inode = dentry->d_inode; + IMustLock(inode); + sb = dentry->d_sb; + si_read_lock(sb, AuLock_FLUSH); + + file = NULL; + if (ia->ia_valid & ATTR_FILE) { + /* currently ftruncate(2) only */ + file = ia->ia_file; + fi_write_lock(file); + ia->ia_file = au_h_fptr(file, au_fbstart(file)); + } + + if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) + ia->ia_valid &= ~ATTR_MODE; + + err = au_lock_and_icpup(dentry, ia, a); + if (unlikely(err < 0)) + goto out_si; + if (au_ftest_icpup(a->flags, DID_CPUP)) { + ia->ia_file = NULL; + ia->ia_valid &= ~ATTR_FILE; + } + + a->h_path.mnt = au_sbr_mnt(sb, a->btgt); + if (ia->ia_valid & ATTR_SIZE) { + struct file *f; + + if (ia->ia_size < i_size_read(inode)) { + /* unmap only */ + err = vmtruncate(inode, ia->ia_size); + if (unlikely(err)) + goto out_unlock; + } + + f = NULL; + if (ia->ia_valid & ATTR_FILE) + f = ia->ia_file; + mutex_unlock(&a->h_inode->i_mutex); + err = vfsub_trunc(&a->h_path, ia->ia_size, ia->ia_valid, f); + mutex_lock_nested(&a->h_inode->i_mutex, AuLsc_I_CHILD); + } else + err = vfsub_notify_change(&a->h_path, ia); + if (!err) + au_cpup_attr_changeable(inode); + + out_unlock: + mutex_unlock(&a->h_inode->i_mutex); + au_unpin(&a->pin); + di_write_unlock(dentry); + out_si: + if (file) { + fi_write_unlock(file); + ia->ia_file = file; + ia->ia_valid |= ATTR_FILE; + } + si_read_unlock(sb); + kfree(a); + out: + return err; +} + +static int au_getattr_lock_reval(struct dentry *dentry, unsigned int sigen) +{ + int err; + struct inode *inode; + struct dentry *parent; + + err = 0; + inode = dentry->d_inode; + di_write_lock_child(dentry); + if (au_digen(dentry) != sigen || au_iigen(inode) != sigen) { + parent = dget_parent(dentry); + di_read_lock_parent(parent, AuLock_IR); + /* returns a number of positive dentries */ + err = au_refresh_hdentry(dentry, inode->i_mode & S_IFMT); + if (err > 0) + err = au_refresh_hinode(inode, dentry); + di_read_unlock(parent, AuLock_IR); + dput(parent); + if (unlikely(!err)) + err = -EIO; + } + di_downgrade_lock(dentry, AuLock_IR); + + return err; +} + +static void au_refresh_iattr(struct inode *inode, struct kstat *st, + unsigned int nlink) +{ + inode->i_mode = st->mode; + inode->i_uid = st->uid; + inode->i_gid = st->gid; + inode->i_atime = st->atime; + inode->i_mtime = st->mtime; + inode->i_ctime = st->ctime; + + au_cpup_attr_nlink(inode, /*force*/0); + if (S_ISDIR(inode->i_mode)) { + inode->i_nlink -= nlink; + inode->i_nlink += st->nlink; + } + + spin_lock(&inode->i_lock); + inode->i_blocks = st->blocks; + i_size_write(inode, st->size); + spin_unlock(&inode->i_lock); +} + +static int aufs_getattr(struct vfsmount *mnt __maybe_unused, + struct dentry *dentry, struct kstat *st) +{ + int err; + unsigned int mnt_flags; + aufs_bindex_t bindex; + unsigned char udba_none, positive, did_lock; + struct super_block *sb, *h_sb; + struct inode *inode; + struct vfsmount *h_mnt; + struct dentry *h_dentry; + + err = 0; + did_lock = 0; + sb = dentry->d_sb; + inode = dentry->d_inode; + si_read_lock(sb, AuLock_FLUSH); + if (IS_ROOT(dentry)) { + /* lock free root dinfo */ + h_dentry = dget(au_di(dentry)->di_hdentry->hd_dentry); + h_mnt = au_sbr_mnt(sb, 0); + goto getattr; + } + + did_lock = 1; + mnt_flags = au_mntflags(sb); + udba_none = !!au_opt_test(mnt_flags, UDBA_NONE); + + /* support fstat(2) */ + if (!d_unhashed(dentry) && !udba_none) { + unsigned int sigen = au_sigen(sb); + if (au_digen(dentry) == sigen && au_iigen(inode) == sigen) + di_read_lock_child(dentry, AuLock_IR); + else { + err = au_getattr_lock_reval(dentry, sigen); + if (unlikely(err)) + goto out; + } + } else + di_read_lock_child(dentry, AuLock_IR); + + bindex = au_ibstart(inode); + h_mnt = au_sbr_mnt(sb, bindex); + h_sb = h_mnt->mnt_sb; + if (!au_test_fs_bad_iattr(h_sb) && udba_none) + goto out_fill; /* success */ + + if (au_dbstart(dentry) == bindex) + h_dentry = dget(au_h_dptr(dentry, bindex)); + else if (au_opt_test(mnt_flags, PLINK) && au_plink_test(inode)) { + h_dentry = au_plink_lkup(inode, bindex); + if (IS_ERR(h_dentry)) + goto out_fill; /* pretending success */ + } else + /* illegally overlapped or something */ + goto out_fill; /* pretending success */ + + getattr: + positive = !!h_dentry->d_inode; + if (positive) + err = vfs_getattr(h_mnt, h_dentry, st); + dput(h_dentry); + if (!err) { + if (positive) + au_refresh_iattr(inode, st, h_dentry->d_inode->i_nlink); + goto out_fill; /* success */ + } + goto out; + + out_fill: + generic_fillattr(inode, st); + out: + if (did_lock) + di_read_unlock(dentry, AuLock_IR); + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int h_readlink(struct dentry *dentry, int bindex, char __user *buf, + int bufsiz) +{ + int err; + struct super_block *sb; + struct dentry *h_dentry; + + err = -EINVAL; + h_dentry = au_h_dptr(dentry, bindex); + if (unlikely(/* !h_dentry + || !h_dentry->d_inode + || !h_dentry->d_inode->i_op + || */ !h_dentry->d_inode->i_op->readlink)) + goto out; + + err = security_inode_readlink(h_dentry); + if (unlikely(err)) + goto out; + + sb = dentry->d_sb; + if (!au_test_ro(sb, bindex, dentry->d_inode)) { + vfsub_touch_atime(au_sbr_mnt(sb, bindex), h_dentry); + fsstack_copy_attr_atime(dentry->d_inode, h_dentry->d_inode); + } + err = h_dentry->d_inode->i_op->readlink(h_dentry, buf, bufsiz); + + out: + return err; +} + +static int aufs_readlink(struct dentry *dentry, char __user *buf, int bufsiz) +{ + int err; + + aufs_read_lock(dentry, AuLock_IR); + err = h_readlink(dentry, au_dbstart(dentry), buf, bufsiz); + aufs_read_unlock(dentry, AuLock_IR); + + return err; +} + +static void *aufs_follow_link(struct dentry *dentry, struct nameidata *nd) +{ + int err; + char *buf; + mm_segment_t old_fs; + + err = -ENOMEM; + buf = __getname(); + if (unlikely(!buf)) + goto out; + + aufs_read_lock(dentry, AuLock_IR); + old_fs = get_fs(); + set_fs(KERNEL_DS); + err = h_readlink(dentry, au_dbstart(dentry), (char __user *)buf, + PATH_MAX); + set_fs(old_fs); + aufs_read_unlock(dentry, AuLock_IR); + + if (err >= 0) { + buf[err] = 0; + /* will be freed by put_link */ + nd_set_link(nd, buf); + return NULL; /* success */ + } + __putname(buf); + + out: + path_put(&nd->path); + AuTraceErr(err); + return ERR_PTR(err); +} + +static void aufs_put_link(struct dentry *dentry __maybe_unused, + struct nameidata *nd, void *cookie __maybe_unused) +{ + __putname(nd_get_link(nd)); +} + +/* ---------------------------------------------------------------------- */ + +static void aufs_truncate_range(struct inode *inode __maybe_unused, + loff_t start __maybe_unused, + loff_t end __maybe_unused) +{ + AuUnsupport(); +} + +/* ---------------------------------------------------------------------- */ + +struct inode_operations aufs_symlink_iop = { + .permission = aufs_permission, + .setattr = aufs_setattr, + .getattr = aufs_getattr, + .readlink = aufs_readlink, + .follow_link = aufs_follow_link, + .put_link = aufs_put_link +}; + +struct inode_operations aufs_dir_iop = { + .create = aufs_create, + .lookup = aufs_lookup, + .link = aufs_link, + .unlink = aufs_unlink, + .symlink = aufs_symlink, + .mkdir = aufs_mkdir, + .rmdir = aufs_rmdir, + .mknod = aufs_mknod, + .rename = aufs_rename, + + .permission = aufs_permission, + .setattr = aufs_setattr, + .getattr = aufs_getattr +}; + +struct inode_operations aufs_iop = { + .permission = aufs_permission, + .setattr = aufs_setattr, + .getattr = aufs_getattr, + .truncate_range = aufs_truncate_range +}; diff -urN ./fs/aufs/i_op_add.c ./fs/aufs/i_op_add.c --- ./fs/aufs/i_op_add.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/i_op_add.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,625 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode operations (add entry) + */ + +#include "aufs.h" + +/* + * final procedure of adding a new entry, except link(2). + * remove whiteout, instantiate, copyup the parent dir's times and size + * and update version. + * if it failed, re-create the removed whiteout. + */ +static int epilog(struct inode *dir, aufs_bindex_t bindex, + struct dentry *wh_dentry, struct dentry *dentry) +{ + int err, rerr; + aufs_bindex_t bwh; + struct path h_path; + struct inode *inode, *h_dir; + struct dentry *wh; + + bwh = -1; + if (wh_dentry) { + h_dir = wh_dentry->d_parent->d_inode; /* dir inode is locked */ + IMustLock(h_dir); + AuDebugOn(au_h_iptr(dir, bindex) != h_dir); + bwh = au_dbwh(dentry); + h_path.dentry = wh_dentry; + h_path.mnt = au_sbr_mnt(dir->i_sb, bindex); + err = au_wh_unlink_dentry(au_h_iptr(dir, bindex), &h_path, + dentry); + if (unlikely(err)) + goto out; + } + + inode = au_new_inode(dentry, /*must_new*/1); + if (!IS_ERR(inode)) { + d_instantiate(dentry, inode); + dir = dentry->d_parent->d_inode; /* dir inode is locked */ + IMustLock(dir); + if (au_ibstart(dir) == au_dbstart(dentry)) + au_cpup_attr_timesizes(dir); + dir->i_version++; + return 0; /* success */ + } + + err = PTR_ERR(inode); + if (!wh_dentry) + goto out; + + /* revert */ + /* dir inode is locked */ + wh = au_wh_create(dentry, bwh, wh_dentry->d_parent); + rerr = PTR_ERR(wh); + if (IS_ERR(wh)) { + AuIOErr("%.*s reverting whiteout failed(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + err = -EIO; + } else + dput(wh); + + out: + return err; +} + +/* + * simple tests for the adding inode operations. + * following the checks in vfs, plus the parent-child relationship. + */ +int au_may_add(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent, int isdir) +{ + int err; + umode_t h_mode; + struct dentry *h_dentry; + struct inode *h_inode; + + h_dentry = au_h_dptr(dentry, bindex); + h_inode = h_dentry->d_inode; + if (!dentry->d_inode) { + err = -EEXIST; + if (unlikely(h_inode)) + goto out; + } else { + /* rename(2) case */ + err = -EIO; + if (unlikely(!h_inode || !h_inode->i_nlink)) + goto out; + + h_mode = h_inode->i_mode; + if (!isdir) { + err = -EISDIR; + if (unlikely(S_ISDIR(h_mode))) + goto out; + } else if (unlikely(!S_ISDIR(h_mode))) { + err = -ENOTDIR; + goto out; + } + } + + err = -EIO; + /* expected parent dir is locked */ + if (unlikely(h_parent != h_dentry->d_parent)) + goto out; + err = 0; + + out: + return err; +} + +/* + * initial procedure of adding a new entry. + * prepare writable branch and the parent dir, lock it, + * and lookup whiteout for the new entry. + */ +static struct dentry* +lock_hdir_lkup_wh(struct dentry *dentry, struct au_dtime *dt, + struct dentry *src_dentry, struct au_pin *pin, + struct au_wr_dir_args *wr_dir_args) +{ + struct dentry *wh_dentry, *h_parent; + struct super_block *sb; + struct au_branch *br; + int err; + unsigned int udba; + aufs_bindex_t bcpup; + + err = au_wr_dir(dentry, src_dentry, wr_dir_args); + bcpup = err; + wh_dentry = ERR_PTR(err); + if (unlikely(err < 0)) + goto out; + + sb = dentry->d_sb; + udba = au_opt_udba(sb); + err = au_pin(pin, dentry, bcpup, udba, + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + wh_dentry = ERR_PTR(err); + if (unlikely(err)) + goto out; + + h_parent = au_pinned_h_parent(pin); + if (udba != AuOpt_UDBA_NONE + && au_dbstart(dentry) == bcpup) { + err = au_may_add(dentry, bcpup, h_parent, + au_ftest_wrdir(wr_dir_args->flags, ISDIR)); + wh_dentry = ERR_PTR(err); + if (unlikely(err)) + goto out_unpin; + } + + br = au_sbr(sb, bcpup); + if (dt) { + struct path tmp = { + .dentry = h_parent, + .mnt = br->br_mnt + }; + au_dtime_store(dt, au_pinned_parent(pin), &tmp); + } + + wh_dentry = NULL; + if (bcpup != au_dbwh(dentry)) + goto out; /* success */ + + wh_dentry = au_wh_lkup(h_parent, &dentry->d_name, br); + + out_unpin: + if (IS_ERR(wh_dentry)) + au_unpin(pin); + out: + return wh_dentry; +} + +/* ---------------------------------------------------------------------- */ + +enum { Mknod, Symlink, Creat }; +struct simple_arg { + int type; + union { + struct { + int mode; + struct nameidata *nd; + } c; + struct { + const char *symname; + } s; + struct { + int mode; + dev_t dev; + } m; + } u; +}; + +static int add_simple(struct inode *dir, struct dentry *dentry, + struct simple_arg *arg) +{ + int err; + aufs_bindex_t bstart; + unsigned char created; + struct au_dtime dt; + struct au_pin pin; + struct path h_path; + struct dentry *wh_dentry, *parent; + struct inode *h_dir; + struct au_wr_dir_args wr_dir_args = { + .force_btgt = -1, + .flags = AuWrDir_ADD_ENTRY + }; + + IMustLock(dir); + + parent = dentry->d_parent; /* dir inode is locked */ + aufs_read_lock(dentry, AuLock_DW); + di_write_lock_parent(parent); + wh_dentry = lock_hdir_lkup_wh(dentry, &dt, /*src_dentry*/NULL, &pin, + &wr_dir_args); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out; + + bstart = au_dbstart(dentry); + h_path.dentry = au_h_dptr(dentry, bstart); + h_path.mnt = au_sbr_mnt(dentry->d_sb, bstart); + h_dir = au_pinned_h_dir(&pin); + switch (arg->type) { + case Creat: + err = vfsub_create(h_dir, &h_path, arg->u.c.mode); + break; + case Symlink: + err = vfsub_symlink(h_dir, &h_path, arg->u.s.symname); + break; + case Mknod: + err = vfsub_mknod(h_dir, &h_path, arg->u.m.mode, arg->u.m.dev); + break; + default: + BUG(); + } + created = !err; + if (!err) + err = epilog(dir, bstart, wh_dentry, dentry); + + /* revert */ + if (unlikely(created && err && h_path.dentry->d_inode)) { + int rerr; + rerr = vfsub_unlink(h_dir, &h_path, /*force*/0); + if (rerr) { + AuIOErr("%.*s revert failure(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + err = -EIO; + } + au_dtime_revert(&dt); + d_drop(dentry); + } + + au_unpin(&pin); + dput(wh_dentry); + + out: + if (unlikely(err)) { + au_update_dbstart(dentry); + d_drop(dentry); + } + di_write_unlock(parent); + aufs_read_unlock(dentry, AuLock_DW); + return err; +} + +int aufs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +{ + struct simple_arg arg = { + .type = Mknod, + .u.m = { + .mode = mode, + .dev = dev + } + }; + return add_simple(dir, dentry, &arg); +} + +int aufs_symlink(struct inode *dir, struct dentry *dentry, const char *symname) +{ + struct simple_arg arg = { + .type = Symlink, + .u.s.symname = symname + }; + return add_simple(dir, dentry, &arg); +} + +int aufs_create(struct inode *dir, struct dentry *dentry, int mode, + struct nameidata *nd) +{ + struct simple_arg arg = { + .type = Creat, + .u.c = { + .mode = mode, + .nd = nd + } + }; + return add_simple(dir, dentry, &arg); +} + +/* ---------------------------------------------------------------------- */ + +struct au_link_args { + aufs_bindex_t bdst, bsrc; + struct au_pin pin; + struct path h_path; + struct dentry *src_parent, *parent; +}; + +static int au_cpup_before_link(struct dentry *src_dentry, + struct au_link_args *a) +{ + int err; + struct dentry *h_src_dentry; + struct mutex *h_mtx; + + di_read_lock_parent(a->src_parent, AuLock_IR); + err = au_test_and_cpup_dirs(src_dentry, a->bdst); + if (unlikely(err)) + goto out; + + h_src_dentry = au_h_dptr(src_dentry, a->bsrc); + h_mtx = &h_src_dentry->d_inode->i_mutex; + err = au_pin(&a->pin, src_dentry, a->bdst, + au_opt_udba(src_dentry->d_sb), + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + if (unlikely(err)) + goto out; + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + err = au_sio_cpup_simple(src_dentry, a->bdst, -1, + AuCpup_DTIME /* | AuCpup_KEEPLINO */); + mutex_unlock(h_mtx); + au_unpin(&a->pin); + + out: + di_read_unlock(a->src_parent, AuLock_IR); + return err; +} + +static int au_cpup_or_link(struct dentry *src_dentry, struct au_link_args *a) +{ + int err; + unsigned char plink; + struct inode *h_inode, *inode; + struct dentry *h_src_dentry; + struct super_block *sb; + + plink = 0; + h_inode = NULL; + sb = src_dentry->d_sb; + inode = src_dentry->d_inode; + if (au_ibstart(inode) <= a->bdst) + h_inode = au_h_iptr(inode, a->bdst); + if (!h_inode || !h_inode->i_nlink) { + /* copyup src_dentry as the name of dentry. */ + au_set_dbstart(src_dentry, a->bdst); + au_set_h_dptr(src_dentry, a->bdst, dget(a->h_path.dentry)); + h_inode = au_h_dptr(src_dentry, a->bsrc)->d_inode; + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + err = au_sio_cpup_single(src_dentry, a->bdst, a->bsrc, -1, + AuCpup_KEEPLINO, a->parent); + mutex_unlock(&h_inode->i_mutex); + au_set_h_dptr(src_dentry, a->bdst, NULL); + au_set_dbstart(src_dentry, a->bsrc); + } else { + /* the inode of src_dentry already exists on a.bdst branch */ + h_src_dentry = d_find_alias(h_inode); + if (!h_src_dentry && au_plink_test(inode)) { + plink = 1; + h_src_dentry = au_plink_lkup(inode, a->bdst); + err = PTR_ERR(h_src_dentry); + if (IS_ERR(h_src_dentry)) + goto out; + + if (unlikely(!h_src_dentry->d_inode)) { + dput(h_src_dentry); + h_src_dentry = NULL; + } + + } + if (h_src_dentry) { + err = vfsub_link(h_src_dentry, au_pinned_h_dir(&a->pin), + &a->h_path); + dput(h_src_dentry); + } else { + AuIOErr("no dentry found for hi%lu on b%d\n", + h_inode->i_ino, a->bdst); + err = -EIO; + } + } + + if (!err && !plink) + au_plink_append(inode, a->bdst, a->h_path.dentry); + +out: + return err; +} + +int aufs_link(struct dentry *src_dentry, struct inode *dir, + struct dentry *dentry) +{ + int err, rerr; + struct au_dtime dt; + struct au_link_args *a; + struct dentry *wh_dentry, *h_src_dentry; + struct inode *inode; + struct super_block *sb; + struct au_wr_dir_args wr_dir_args = { + /* .force_btgt = -1, */ + .flags = AuWrDir_ADD_ENTRY + }; + + IMustLock(dir); + inode = src_dentry->d_inode; + IMustLock(inode); + + err = -ENOMEM; + a = kzalloc(sizeof(*a), GFP_NOFS); + if (unlikely(!a)) + goto out; + + a->parent = dentry->d_parent; /* dir inode is locked */ + aufs_read_and_write_lock2(dentry, src_dentry, /*AuLock_FLUSH*/0); + a->src_parent = dget_parent(src_dentry); + wr_dir_args.force_btgt = au_dbstart(src_dentry); + + di_write_lock_parent(a->parent); + wr_dir_args.force_btgt = au_wbr(dentry, wr_dir_args.force_btgt); + wh_dentry = lock_hdir_lkup_wh(dentry, &dt, src_dentry, &a->pin, + &wr_dir_args); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out_unlock; + + err = 0; + sb = dentry->d_sb; + a->bdst = au_dbstart(dentry); + a->h_path.dentry = au_h_dptr(dentry, a->bdst); + a->h_path.mnt = au_sbr_mnt(sb, a->bdst); + a->bsrc = au_dbstart(src_dentry); + if (au_opt_test(au_mntflags(sb), PLINK)) { + if (a->bdst < a->bsrc + /* && h_src_dentry->d_sb != a->h_path.dentry->d_sb */) + err = au_cpup_or_link(src_dentry, a); + else { + h_src_dentry = au_h_dptr(src_dentry, a->bdst); + err = vfsub_link(h_src_dentry, au_pinned_h_dir(&a->pin), + &a->h_path); + } + } else { + /* + * copyup src_dentry to the branch we process, + * and then link(2) to it. + */ + if (a->bdst < a->bsrc + /* && h_src_dentry->d_sb != a->h_path.dentry->d_sb */) { + au_unpin(&a->pin); + di_write_unlock(a->parent); + err = au_cpup_before_link(src_dentry, a); + if (!err) { + di_write_lock_parent(a->parent); + err = au_pin(&a->pin, dentry, a->bdst, + au_opt_udba(sb), + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + if (unlikely(err)) + goto out_wh; + } + } + if (!err) { + h_src_dentry = au_h_dptr(src_dentry, a->bdst); + err = vfsub_link(h_src_dentry, au_pinned_h_dir(&a->pin), + &a->h_path); + } + } + if (unlikely(err)) + goto out_unpin; + + if (wh_dentry) { + a->h_path.dentry = wh_dentry; + err = au_wh_unlink_dentry(au_pinned_h_dir(&a->pin), &a->h_path, + dentry); + if (unlikely(err)) + goto out_revert; + } + + dir->i_version++; + if (au_ibstart(dir) == au_dbstart(dentry)) + au_cpup_attr_timesizes(dir); + inc_nlink(inode); + inode->i_ctime = dir->i_ctime; + if (!d_unhashed(a->h_path.dentry)) + d_instantiate(dentry, au_igrab(inode)); + else + /* some filesystem calls d_drop() */ + d_drop(dentry); + goto out_unpin; /* success */ + + out_revert: + rerr = vfsub_unlink(au_pinned_h_dir(&a->pin), &a->h_path, /*force*/0); + if (!rerr) + goto out_dt; + AuIOErr("%.*s reverting failed(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + err = -EIO; + out_dt: + d_drop(dentry); + au_dtime_revert(&dt); + out_unpin: + au_unpin(&a->pin); + out_wh: + dput(wh_dentry); + out_unlock: + if (unlikely(err)) { + au_update_dbstart(dentry); + d_drop(dentry); + } + di_write_unlock(a->parent); + dput(a->src_parent); + aufs_read_and_write_unlock2(dentry, src_dentry); + kfree(a); + out: + return err; +} + +int aufs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + int err, rerr; + aufs_bindex_t bindex; + unsigned char diropq; + struct au_pin pin; + struct path h_path; + struct dentry *wh_dentry, *parent, *opq_dentry; + struct mutex *h_mtx; + struct super_block *sb; + struct au_dtime dt; + struct au_wr_dir_args wr_dir_args = { + .force_btgt = -1, + .flags = AuWrDir_ADD_ENTRY | AuWrDir_ISDIR + }; + + IMustLock(dir); + + aufs_read_lock(dentry, AuLock_DW); + parent = dentry->d_parent; /* dir inode is locked */ + di_write_lock_parent(parent); + wh_dentry = lock_hdir_lkup_wh(dentry, &dt, /*src_dentry*/NULL, &pin, + &wr_dir_args); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out; + + sb = dentry->d_sb; + bindex = au_dbstart(dentry); + h_path.dentry = au_h_dptr(dentry, bindex); + h_path.mnt = au_sbr_mnt(sb, bindex); + err = vfsub_mkdir(au_pinned_h_dir(&pin), &h_path, mode); + if (unlikely(err)) + goto out_unlock; + + /* make the dir opaque */ + diropq = 0; + h_mtx = &h_path.dentry->d_inode->i_mutex; + if (wh_dentry + || au_opt_test(au_mntflags(sb), ALWAYS_DIROPQ)) { + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + opq_dentry = au_diropq_create(dentry, bindex); + mutex_unlock(h_mtx); + err = PTR_ERR(opq_dentry); + if (IS_ERR(opq_dentry)) + goto out_dir; + dput(opq_dentry); + diropq = 1; + } + + err = epilog(dir, bindex, wh_dentry, dentry); + if (!err) { + inc_nlink(dir); + goto out_unlock; /* success */ + } + + /* revert */ + if (diropq) { + AuLabel(revert opq); + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + rerr = au_diropq_remove(dentry, bindex); + mutex_unlock(h_mtx); + if (rerr) { + AuIOErr("%.*s reverting diropq failed(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + err = -EIO; + } + } + + out_dir: + AuLabel(revert dir); + rerr = vfsub_rmdir(au_pinned_h_dir(&pin), &h_path); + if (rerr) { + AuIOErr("%.*s reverting dir failed(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + err = -EIO; + } + d_drop(dentry); + au_dtime_revert(&dt); + out_unlock: + au_unpin(&pin); + dput(wh_dentry); + out: + if (unlikely(err)) { + au_update_dbstart(dentry); + d_drop(dentry); + } + di_write_unlock(parent); + aufs_read_unlock(dentry, AuLock_DW); + return err; +} diff -urN ./fs/aufs/i_op_del.c ./fs/aufs/i_op_del.c --- ./fs/aufs/i_op_del.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/i_op_del.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,471 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode operations (del entry) + */ + +#include "aufs.h" + +/* + * decide if a new whiteout for @dentry is necessary or not. + * when it is necessary, prepare the parent dir for the upper branch whose + * branch index is @bcpup for creation. the actual creation of the whiteout will + * be done by caller. + * return value: + * 0: wh is unnecessary + * plus: wh is necessary + * minus: error + */ +int au_wr_dir_need_wh(struct dentry *dentry, int isdir, aufs_bindex_t *bcpup) +{ + int need_wh, err; + aufs_bindex_t bstart; + struct super_block *sb; + + sb = dentry->d_sb; + bstart = au_dbstart(dentry); + if (*bcpup < 0) { + *bcpup = bstart; + if (au_test_ro(sb, bstart, dentry->d_inode)) { + err = AuWbrCopyup(au_sbi(sb), dentry); + *bcpup = err; + if (unlikely(err < 0)) + goto out; + } + } else + AuDebugOn(bstart < *bcpup + || au_test_ro(sb, *bcpup, dentry->d_inode)); + AuDbg("bcpup %d, bstart %d\n", *bcpup, bstart); + + if (*bcpup != bstart) { + err = au_cpup_dirs(dentry, *bcpup); + if (unlikely(err)) + goto out; + need_wh = 1; + } else { + aufs_bindex_t old_bend, new_bend, bdiropq = -1; + + old_bend = au_dbend(dentry); + if (isdir) { + bdiropq = au_dbdiropq(dentry); + au_set_dbdiropq(dentry, -1); + } + need_wh = au_lkup_dentry(dentry, bstart + 1, /*type*/0, + /*nd*/NULL); + err = need_wh; + if (isdir) + au_set_dbdiropq(dentry, bdiropq); + if (unlikely(err < 0)) + goto out; + new_bend = au_dbend(dentry); + if (!need_wh && old_bend != new_bend) { + au_set_h_dptr(dentry, new_bend, NULL); + au_set_dbend(dentry, old_bend); + } + } + AuDbg("need_wh %d\n", need_wh); + err = need_wh; + + out: + return err; +} + +/* + * simple tests for the del-entry operations. + * following the checks in vfs, plus the parent-child relationship. + */ +int au_may_del(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent, int isdir) +{ + int err; + umode_t h_mode; + struct dentry *h_dentry, *h_latest; + struct inode *h_inode; + + h_dentry = au_h_dptr(dentry, bindex); + h_inode = h_dentry->d_inode; + if (dentry->d_inode) { + err = -ENOENT; + if (unlikely(!h_inode || !h_inode->i_nlink)) + goto out; + + h_mode = h_inode->i_mode; + if (!isdir) { + err = -EISDIR; + if (unlikely(S_ISDIR(h_mode))) + goto out; + } else if (unlikely(!S_ISDIR(h_mode))) { + err = -ENOTDIR; + goto out; + } + } else { + /* rename(2) case */ + err = -EIO; + if (unlikely(h_inode)) + goto out; + } + + err = -ENOENT; + /* expected parent dir is locked */ + if (unlikely(h_parent != h_dentry->d_parent)) + goto out; + err = 0; + + /* + * rmdir a dir may break the consistency on some filesystem. + * let's try heavy test. + */ + err = -EACCES; + if (unlikely(au_test_h_perm(h_parent->d_inode, MAY_EXEC | MAY_WRITE))) + goto out; + + h_latest = au_sio_lkup_one(&dentry->d_name, h_parent, + au_sbr(dentry->d_sb, bindex)); + err = -EIO; + if (IS_ERR(h_latest)) + goto out; + if (h_latest == h_dentry) + err = 0; + dput(h_latest); + + out: + return err; +} + +/* + * decide the branch where we operate for @dentry. the branch index will be set + * @rbcpup. after diciding it, 'pin' it and store the timestamps of the parent + * dir for reverting. + * when a new whiteout is necessary, create it. + */ +static struct dentry* +lock_hdir_create_wh(struct dentry *dentry, int isdir, aufs_bindex_t *rbcpup, + struct au_dtime *dt, struct au_pin *pin) +{ + struct dentry *wh_dentry; + struct super_block *sb; + struct path h_path; + int err, need_wh; + unsigned int udba; + aufs_bindex_t bcpup; + + need_wh = au_wr_dir_need_wh(dentry, isdir, rbcpup); + wh_dentry = ERR_PTR(need_wh); + if (unlikely(need_wh < 0)) + goto out; + + sb = dentry->d_sb; + udba = au_opt_udba(sb); + bcpup = *rbcpup; + err = au_pin(pin, dentry, bcpup, udba, + AuPin_DI_LOCKED | AuPin_MNT_WRITE); + wh_dentry = ERR_PTR(err); + if (unlikely(err)) + goto out; + + h_path.dentry = au_pinned_h_parent(pin); + if (udba != AuOpt_UDBA_NONE + && au_dbstart(dentry) == bcpup) { + err = au_may_del(dentry, bcpup, h_path.dentry, isdir); + wh_dentry = ERR_PTR(err); + if (unlikely(err)) + goto out_unpin; + } + + h_path.mnt = au_sbr_mnt(sb, bcpup); + au_dtime_store(dt, au_pinned_parent(pin), &h_path); + wh_dentry = NULL; + if (!need_wh) + goto out; /* success, no need to create whiteout */ + + wh_dentry = au_wh_create(dentry, bcpup, h_path.dentry); + if (!IS_ERR(wh_dentry)) + goto out; /* success */ + /* returns with the parent is locked and wh_dentry is dget-ed */ + + out_unpin: + au_unpin(pin); + out: + return wh_dentry; +} + +/* + * when removing a dir, rename it to a unique temporary whiteout-ed name first + * in order to be revertible and save time for removing many child whiteouts + * under the dir. + * returns 1 when there are too many child whiteout and caller should remove + * them asynchronously. returns 0 when the number of children is enough small to + * remove now or the branch fs is a remote fs. + * otherwise return an error. + */ +static int renwh_and_rmdir(struct dentry *dentry, aufs_bindex_t bindex, + struct au_nhash *whlist, struct inode *dir) +{ + int rmdir_later, err, dirwh; + struct dentry *h_dentry; + struct super_block *sb; + + sb = dentry->d_sb; + h_dentry = au_h_dptr(dentry, bindex); + err = au_whtmp_ren(h_dentry, au_sbr(sb, bindex)); + if (unlikely(err)) + goto out; + + /* stop monitoring */ + au_hin_free(au_hi(dentry->d_inode, bindex)); + + if (!au_test_fs_remote(h_dentry->d_sb)) { + dirwh = au_sbi(sb)->si_dirwh; + rmdir_later = (dirwh <= 1); + if (!rmdir_later) + rmdir_later = au_nhash_test_longer_wh(whlist, bindex, + dirwh); + if (rmdir_later) + return rmdir_later; + } + + err = au_whtmp_rmdir(dir, bindex, h_dentry, whlist); + if (unlikely(err)) { + AuIOErr("rmdir %.*s, b%d failed, %d. ignored\n", + AuDLNPair(h_dentry), bindex, err); + err = 0; + } + + out: + return err; +} + +/* + * final procedure for deleting a entry. + * maintain dentry and iattr. + */ +static void epilog(struct inode *dir, struct dentry *dentry, + aufs_bindex_t bindex) +{ + struct inode *inode; + + /* + * even if this is not a dir, + * set S_DEAD here since we need to detect the dead inode. + */ + inode = dentry->d_inode; + if (!inode->i_nlink) + inode->i_flags |= S_DEAD; + d_drop(dentry); + inode->i_ctime = dir->i_ctime; + + if (atomic_read(&dentry->d_count) == 1) { + au_set_h_dptr(dentry, au_dbstart(dentry), NULL); + au_update_dbstart(dentry); + } + if (au_ibstart(dir) == bindex) + au_cpup_attr_timesizes(dir); + dir->i_version++; +} + +/* + * when an error happened, remove the created whiteout and revert everything. + */ +static int do_revert(int err, struct inode *dir, aufs_bindex_t bwh, + struct dentry *wh_dentry, struct dentry *dentry, + struct au_dtime *dt) +{ + int rerr; + struct path h_path = { + .dentry = wh_dentry, + .mnt = au_sbr_mnt(dir->i_sb, bwh) + }; + + rerr = au_wh_unlink_dentry(au_h_iptr(dir, bwh), &h_path, dentry); + if (!rerr) { + au_set_dbwh(dentry, bwh); + au_dtime_revert(dt); + return 0; + } + + AuIOErr("%.*s reverting whiteout failed(%d, %d)\n", + AuDLNPair(dentry), err, rerr); + return -EIO; +} + +/* ---------------------------------------------------------------------- */ + +int aufs_unlink(struct inode *dir, struct dentry *dentry) +{ + int err; + aufs_bindex_t bwh, bindex, bstart; + struct au_dtime dt; + struct au_pin pin; + struct path h_path; + struct inode *inode, *h_dir; + struct dentry *parent, *wh_dentry; + + IMustLock(dir); + inode = dentry->d_inode; + if (unlikely(!inode)) + return -ENOENT; /* possible? */ + IMustLock(inode); + + aufs_read_lock(dentry, AuLock_DW); + parent = dentry->d_parent; /* dir inode is locked */ + di_write_lock_parent(parent); + + bstart = au_dbstart(dentry); + bwh = au_dbwh(dentry); + bindex = -1; + wh_dentry = lock_hdir_create_wh(dentry, /*isdir*/0, &bindex, &dt, &pin); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out; + + h_path.mnt = au_sbr_mnt(dentry->d_sb, bstart); + h_path.dentry = au_h_dptr(dentry, bstart); + dget(h_path.dentry); + if (bindex == bstart) { + h_dir = au_pinned_h_dir(&pin); + err = vfsub_unlink(h_dir, &h_path, /*force*/0); + } else { + /* dir inode is locked */ + h_dir = wh_dentry->d_parent->d_inode; + IMustLock(h_dir); + err = 0; + } + + if (!err) { + drop_nlink(inode); + epilog(dir, dentry, bindex); + + /* update target timestamps */ + if (bindex == bstart) { + vfsub_update_h_iattr(&h_path, /*did*/NULL); /*ignore*/ + inode->i_ctime = h_path.dentry->d_inode->i_ctime; + } else + /* todo: this timestamp may be reverted later */ + inode->i_ctime = h_dir->i_ctime; + goto out_unlock; /* success */ + } + + /* revert */ + if (wh_dentry) { + int rerr; + + rerr = do_revert(err, dir, bwh, wh_dentry, dentry, &dt); + if (rerr) + err = rerr; + } + + out_unlock: + au_unpin(&pin); + dput(wh_dentry); + dput(h_path.dentry); + out: + di_write_unlock(parent); + aufs_read_unlock(dentry, AuLock_DW); + return err; +} + +int aufs_rmdir(struct inode *dir, struct dentry *dentry) +{ + int err, rmdir_later; + aufs_bindex_t bwh, bindex, bstart; + struct au_dtime dt; + struct au_pin pin; + struct inode *inode; + struct dentry *parent, *wh_dentry, *h_dentry; + struct au_whtmp_rmdir_args *args; + struct au_nhash *whlist; + + IMustLock(dir); + inode = dentry->d_inode; + err = -ENOENT; /* possible? */ + if (unlikely(!inode)) + goto out; + IMustLock(inode); + + whlist = au_nhash_new(GFP_NOFS); + err = PTR_ERR(whlist); + if (IS_ERR(whlist)) + goto out; + + err = -ENOMEM; + args = kmalloc(sizeof(*args), GFP_NOFS); + if (unlikely(!args)) + goto out_whlist; + + aufs_read_lock(dentry, AuLock_DW | AuLock_FLUSH); + parent = dentry->d_parent; /* dir inode is locked */ + di_write_lock_parent(parent); + err = au_test_empty(dentry, whlist); + if (unlikely(err)) + goto out_args; + + bstart = au_dbstart(dentry); + bwh = au_dbwh(dentry); + bindex = -1; + wh_dentry = lock_hdir_create_wh(dentry, /*isdir*/1, &bindex, &dt, &pin); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out_args; + + h_dentry = au_h_dptr(dentry, bstart); + dget(h_dentry); + rmdir_later = 0; + if (bindex == bstart) { + err = renwh_and_rmdir(dentry, bstart, whlist, dir); + if (err > 0) { + rmdir_later = err; + err = 0; + } + } else { + /* stop monitoring */ + au_hin_free(au_hi(inode, bstart)); + + /* dir inode is locked */ + IMustLock(wh_dentry->d_parent->d_inode); + err = 0; + } + + if (!err) { + clear_nlink(inode); + au_set_dbdiropq(dentry, -1); + epilog(dir, dentry, bindex); + + if (rmdir_later) { + au_whtmp_kick_rmdir(dir, bstart, h_dentry, whlist, + args); + args = NULL; + } + + goto out_unlock; /* success */ + } + + /* revert */ + AuLabel(revert); + if (wh_dentry) { + int rerr; + + rerr = do_revert(err, dir, bwh, wh_dentry, dentry, &dt); + if (rerr) + err = rerr; + } + + out_unlock: + au_unpin(&pin); + dput(wh_dentry); + dput(h_dentry); + out_args: + di_write_unlock(parent); + aufs_read_unlock(dentry, AuLock_DW); + kfree(args); + out_whlist: + au_nhash_del(whlist); + out: + return err; +} diff -urN ./fs/aufs/i_op_ren.c ./fs/aufs/i_op_ren.c --- ./fs/aufs/i_op_ren.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/i_op_ren.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,929 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode operation (rename entry) + * todo: this is crazy monster + */ + +#include "aufs.h" + +enum { AuSRC, AuDST, AuSrcDst }; +enum { AuPARENT, AuCHILD, AuParentChild }; + +#define AuRen_ISDIR 1 +#define AuRen_ISSAMEDIR (1 << 1) +#define AuRen_WHSRC (1 << 2) +#define AuRen_WHDST (1 << 3) +#define AuRen_MNT_WRITE (1 << 4) +#define AuRen_DT_DSTDIR (1 << 5) +#define AuRen_DIROPQ (1 << 6) +#define AuRen_CPUP (1 << 7) +#define au_ftest_ren(flags, name) ((flags) & AuRen_##name) +#define au_fset_ren(flags, name) { (flags) |= AuRen_##name; } +#define au_fclr_ren(flags, name) { (flags) &= ~AuRen_##name; } + +struct au_ren_args { + struct { + struct dentry *dentry, *h_dentry, *parent, *h_parent, + *wh_dentry; + struct inode *dir, *inode; + struct au_hinode *hdir; + struct au_dtime dt[AuParentChild]; + aufs_bindex_t bstart; + } sd[AuSrcDst]; + +#define src_dentry sd[AuSRC].dentry +#define src_dir sd[AuSRC].dir +#define src_inode sd[AuSRC].inode +#define src_h_dentry sd[AuSRC].h_dentry +#define src_parent sd[AuSRC].parent +#define src_h_parent sd[AuSRC].h_parent +#define src_wh_dentry sd[AuSRC].wh_dentry +#define src_hdir sd[AuSRC].hdir +#define src_h_dir sd[AuSRC].hdir->hi_inode +#define src_dt sd[AuSRC].dt +#define src_bstart sd[AuSRC].bstart + +#define dst_dentry sd[AuDST].dentry +#define dst_dir sd[AuDST].dir +#define dst_inode sd[AuDST].inode +#define dst_h_dentry sd[AuDST].h_dentry +#define dst_parent sd[AuDST].parent +#define dst_h_parent sd[AuDST].h_parent +#define dst_wh_dentry sd[AuDST].wh_dentry +#define dst_hdir sd[AuDST].hdir +#define dst_h_dir sd[AuDST].hdir->hi_inode +#define dst_dt sd[AuDST].dt +#define dst_bstart sd[AuDST].bstart + + struct dentry *h_trap; + struct au_branch *br; + struct au_hinode *src_hinode; + struct path h_path; + struct au_nhash whlist; + aufs_bindex_t btgt; + + unsigned int flags; + + struct au_whtmp_rmdir_args *thargs; + struct dentry *h_dst; +}; + +/* ---------------------------------------------------------------------- */ + +/* + * functions for reverting. + * when an error happened in a single rename systemcall, we should revert + * everything as if nothing happend. + * we don't need to revert the copied-up/down the parent dir since they are + * harmless. + */ + +#define RevertFailure(fmt, args...) do { \ + AuIOErr("revert failure: " fmt " (%d, %d)\n", \ + ##args, err, rerr); \ + err = -EIO; \ +} while (0) + +static void au_ren_rev_diropq(int err, struct au_ren_args *a) +{ + int rerr; + + au_hin_imtx_lock_nested(a->src_hinode, AuLsc_I_CHILD); + rerr = au_diropq_remove(a->src_dentry, a->btgt); + au_hin_imtx_unlock(a->src_hinode); + if (rerr) + RevertFailure("remove diropq %.*s", AuDLNPair(a->src_dentry)); +} + + +static void au_ren_rev_rename(int err, struct au_ren_args *a) +{ + int rerr; + + a->h_path.dentry = au_lkup_one(&a->src_dentry->d_name, a->src_h_parent, + a->br, /*nd*/NULL); + rerr = PTR_ERR(a->h_path.dentry); + if (IS_ERR(a->h_path.dentry)) { + RevertFailure("au_lkup_one %.*s", AuDLNPair(a->src_dentry)); + return; + } + + rerr = vfsub_rename(a->dst_h_dir, + au_h_dptr(a->src_dentry, a->btgt), + a->src_h_dir, &a->h_path); + d_drop(a->h_path.dentry); + dput(a->h_path.dentry); + /* au_set_h_dptr(a->src_dentry, a->btgt, NULL); */ + if (rerr) + RevertFailure("rename %.*s", AuDLNPair(a->src_dentry)); +} + +static void au_ren_rev_cpup(int err, struct au_ren_args *a) +{ + int rerr; + + a->h_path.dentry = a->dst_h_dentry; + rerr = vfsub_unlink(a->dst_h_dir, &a->h_path, /*force*/0); + au_set_h_dptr(a->src_dentry, a->btgt, NULL); + au_set_dbstart(a->src_dentry, a->src_bstart); + if (rerr) + RevertFailure("unlink %.*s", AuDLNPair(a->dst_h_dentry)); +} + + +static void au_ren_rev_whtmp(int err, struct au_ren_args *a) +{ + int rerr; + + a->h_path.dentry = au_lkup_one(&a->dst_dentry->d_name, a->dst_h_parent, + a->br, /*nd*/NULL); + rerr = PTR_ERR(a->h_path.dentry); + if (IS_ERR(a->h_path.dentry)) { + RevertFailure("lookup %.*s", AuDLNPair(a->dst_dentry)); + return; + } + if (a->h_path.dentry->d_inode) { + d_drop(a->h_path.dentry); + dput(a->h_path.dentry); + return; + } + + rerr = vfsub_rename(a->dst_h_dir, a->h_dst, a->dst_h_dir, &a->h_path); + d_drop(a->h_path.dentry); + dput(a->h_path.dentry); + if (!rerr) { + au_set_h_dptr(a->dst_dentry, a->btgt, NULL); + au_set_h_dptr(a->dst_dentry, a->btgt, dget(a->h_dst)); + } else + RevertFailure("rename %.*s", AuDLNPair(a->h_dst)); +} + +static void au_ren_rev_whsrc(int err, struct au_ren_args *a) +{ + int rerr; + + a->h_path.dentry = a->src_wh_dentry; + rerr = au_wh_unlink_dentry(a->src_h_dir, &a->h_path, a->src_dentry); + if (rerr) + RevertFailure("unlink %.*s", AuDLNPair(a->src_wh_dentry)); +} + +static void au_ren_rev_drop(struct au_ren_args *a) +{ + struct dentry *d, *h_d; + int i; + aufs_bindex_t bend, bindex; + + for (i = 0; i < AuSrcDst; i++) { + d = a->sd[i].dentry; + d_drop(d); + bend = au_dbend(d); + for (bindex = au_dbstart(d); bindex <= bend; bindex++) { + h_d = au_h_dptr(d, bindex); + if (h_d) + d_drop(h_d); + } + } + + au_update_dbstart(a->dst_dentry); + if (a->thargs) + d_drop(a->h_dst); +} +#undef RevertFailure + +/* ---------------------------------------------------------------------- */ + +/* + * when we have to copyup the renaming entry, do it with the rename-target name + * in order to minimize the cost (the later actual rename is unnecessary). + * otherwise rename it on the target branch. + */ +static int au_ren_or_cpup(struct au_ren_args *a) +{ + int err; + struct dentry *d; + + d = a->src_dentry; + if (au_dbstart(d) == a->btgt) { + a->h_path.dentry = a->dst_h_dentry; + if (au_ftest_ren(a->flags, DIROPQ) + && au_dbdiropq(d) == a->btgt) + au_fclr_ren(a->flags, DIROPQ); + AuDebugOn(au_dbstart(d) != a->btgt); + err = vfsub_rename(a->src_h_dir, au_h_dptr(d, a->btgt), + a->dst_h_dir, &a->h_path); + } else { + struct mutex *h_mtx = &a->src_h_dentry->d_inode->i_mutex; + + au_fset_ren(a->flags, CPUP); + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + au_set_dbstart(d, a->btgt); + au_set_h_dptr(d, a->btgt, dget(a->dst_h_dentry)); + err = au_sio_cpup_single(d, a->btgt, a->src_bstart, -1, + !AuCpup_DTIME, a->dst_parent); + if (unlikely(err)) { + au_set_h_dptr(d, a->btgt, NULL); + au_set_dbstart(d, a->src_bstart); + } + mutex_unlock(h_mtx); + } + + return err; +} + +/* cf. aufs_rmdir() */ +static int au_ren_del_whtmp(struct au_ren_args *a) +{ + int err; + struct inode *dir; + + dir = a->dst_dir; + if (!au_nhash_test_longer_wh(&a->whlist, a->btgt, + au_sbi(dir->i_sb)->si_dirwh) + || au_test_fs_remote(a->h_dst->d_sb)) { + err = au_whtmp_rmdir(dir, a->btgt, a->h_dst, &a->whlist); + if (unlikely(err)) + AuWarn("failed removing whtmp dir %.*s (%d), " + "ignored.\n", AuDLNPair(a->h_dst), err); + } else { + au_whtmp_kick_rmdir(dir, a->btgt, a->h_dst, &a->whlist, + a->thargs); + dput(a->h_dst); + a->thargs = NULL; + } + + return 0; +} + +/* make it 'opaque' dir. */ +static int au_ren_diropq(struct au_ren_args *a) +{ + int err; + struct dentry *diropq; + + err = 0; + a->src_hinode = au_hi(a->src_inode, a->btgt); + au_hin_imtx_lock_nested(a->src_hinode, AuLsc_I_CHILD); + diropq = au_diropq_create(a->src_dentry, a->btgt); + au_hin_imtx_unlock(a->src_hinode); + if (IS_ERR(diropq)) + err = PTR_ERR(diropq); + dput(diropq); + + return err; +} + +static int do_rename(struct au_ren_args *a) +{ + int err; + struct dentry *d, *h_d; + + /* prepare workqueue args for asynchronous rmdir */ + h_d = a->dst_h_dentry; + if (au_ftest_ren(a->flags, ISDIR) && h_d->d_inode) { + err = -ENOMEM; + a->thargs = kmalloc(sizeof(*a->thargs), GFP_NOFS); + if (unlikely(!a->thargs)) + goto out; + a->h_dst = dget(h_d); + } + + /* create whiteout for src_dentry */ + if (au_ftest_ren(a->flags, WHSRC)) { + a->src_wh_dentry + = au_wh_create(a->src_dentry, a->btgt, a->src_h_parent); + err = PTR_ERR(a->src_wh_dentry); + if (IS_ERR(a->src_wh_dentry)) + goto out_thargs; + } + + /* lookup whiteout for dentry */ + if (au_ftest_ren(a->flags, WHDST)) { + h_d = au_wh_lkup(a->dst_h_parent, &a->dst_dentry->d_name, + a->br); + err = PTR_ERR(h_d); + if (IS_ERR(h_d)) + goto out_whsrc; + if (!h_d->d_inode) + dput(h_d); + else + a->dst_wh_dentry = h_d; + } + + /* rename dentry to tmpwh */ + if (a->thargs) { + err = au_whtmp_ren(a->dst_h_dentry, a->br); + if (unlikely(err)) + goto out_whdst; + + d = a->dst_dentry; + au_set_h_dptr(d, a->btgt, NULL); + err = au_lkup_neg(d, a->btgt); + if (unlikely(err)) + goto out_whtmp; + a->dst_h_dentry = au_h_dptr(d, a->btgt); + } + + /* cpup src */ + if (a->dst_h_dentry->d_inode && a->src_bstart != a->btgt) { + struct mutex *h_mtx = &a->src_h_dentry->d_inode->i_mutex; + + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + err = au_sio_cpup_simple(a->src_dentry, a->btgt, -1, + !AuCpup_DTIME); + mutex_unlock(h_mtx); + if (unlikely(err)) + goto out_whtmp; + } + + /* rename by vfs_rename or cpup */ + d = a->dst_dentry; + if (au_ftest_ren(a->flags, ISDIR) + && (a->dst_wh_dentry + || au_dbdiropq(d) == a->btgt + /* hide the lower to keep xino */ + || a->btgt < au_dbend(d) + || au_opt_test(au_mntflags(d->d_sb), ALWAYS_DIROPQ))) + au_fset_ren(a->flags, DIROPQ); + err = au_ren_or_cpup(a); + if (unlikely(err)) + /* leave the copied-up one */ + goto out_whtmp; + + /* make dir opaque */ + if (au_ftest_ren(a->flags, DIROPQ)) { + err = au_ren_diropq(a); + if (unlikely(err)) + goto out_rename; + } + + /* update target timestamps */ + AuDebugOn(au_dbstart(a->src_dentry) != a->btgt); + a->h_path.dentry = au_h_dptr(a->src_dentry, a->btgt); + vfsub_update_h_iattr(&a->h_path, /*did*/NULL); /*ignore*/ + a->src_inode->i_ctime = a->h_path.dentry->d_inode->i_ctime; + + /* remove whiteout for dentry */ + if (a->dst_wh_dentry) { + a->h_path.dentry = a->dst_wh_dentry; + err = au_wh_unlink_dentry(a->dst_h_dir, &a->h_path, + a->dst_dentry); + if (unlikely(err)) + goto out_diropq; + } + + /* remove whtmp */ + if (a->thargs) + au_ren_del_whtmp(a); /* ignore this error */ + + err = 0; + goto out_success; + + out_diropq: + if (au_ftest_ren(a->flags, DIROPQ)) + au_ren_rev_diropq(err, a); + out_rename: + if (!au_ftest_ren(a->flags, CPUP)) + au_ren_rev_rename(err, a); + else + au_ren_rev_cpup(err, a); + out_whtmp: + if (a->thargs) + au_ren_rev_whtmp(err, a); + out_whdst: + dput(a->dst_wh_dentry); + a->dst_wh_dentry = NULL; + out_whsrc: + if (a->src_wh_dentry) + au_ren_rev_whsrc(err, a); + au_ren_rev_drop(a); + out_success: + dput(a->src_wh_dentry); + dput(a->dst_wh_dentry); + out_thargs: + if (a->thargs) { + dput(a->h_dst); + kfree(a->thargs); + } + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * test if @dentry dir can be rename destination or not. + * success means, it is a logically empty dir. + */ +static int may_rename_dstdir(struct dentry *dentry, struct au_nhash *whlist) +{ + return au_test_empty(dentry, whlist); +} + +/* + * test if @dentry dir can be rename source or not. + * if it can, return 0 and @children is filled. + * success means, + * - it is a logically empty dir. + * - or, it exists on writable branch and has no children including whiteouts + * on the lower branch. + */ +static int may_rename_srcdir(struct dentry *dentry, aufs_bindex_t btgt) +{ + int err; + aufs_bindex_t bstart; + + bstart = au_dbstart(dentry); + if (bstart != btgt) { + struct au_nhash *whlist; + + whlist = au_nhash_new(GFP_NOFS); + err = PTR_ERR(whlist); + if (IS_ERR(whlist)) + goto out; + err = au_test_empty(dentry, whlist); + au_nhash_del(whlist); + goto out; + } + + if (bstart == au_dbtaildir(dentry)) + return 0; /* success */ + + err = au_test_empty_lower(dentry); + + out: + if (err == -ENOTEMPTY) { + AuWarn1("renaming dir who has child(ren) on multiple branches," + " is not supported\n"); + err = -EXDEV; + } + return err; +} + +/* side effect: sets whlist and h_dentry */ +static int au_ren_may_dir(struct au_ren_args *a) +{ + int err; + struct dentry *d; + + err = 0; + au_nhash_init(&a->whlist); + d = a->dst_dentry; + if (au_ftest_ren(a->flags, ISDIR) && a->dst_inode) { + au_set_dbstart(d, a->dst_bstart); + err = may_rename_dstdir(d, &a->whlist); + au_set_dbstart(d, a->btgt); + } + a->dst_h_dentry = au_h_dptr(d, au_dbstart(d)); + if (unlikely(err)) + goto out; + + d = a->src_dentry; + a->src_h_dentry = au_h_dptr(d, au_dbstart(d)); + if (au_ftest_ren(a->flags, ISDIR)) { + err = may_rename_srcdir(d, a->btgt); + if (unlikely(err)) + au_nhash_fin(&a->whlist); + } + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * simple tests for rename. + * following the checks in vfs, plus the parent-child relationship. + */ +static int au_may_ren(struct au_ren_args *a) +{ + int err, isdir; + struct inode *h_inode; + + if (a->src_bstart == a->btgt) { + err = au_may_del(a->src_dentry, a->btgt, a->src_h_parent, + au_ftest_ren(a->flags, ISDIR)); + if (unlikely(err)) + goto out; + err = -EINVAL; + if (unlikely(a->src_h_dentry == a->h_trap)) + goto out; + } + + err = 0; + if (a->dst_bstart != a->btgt) + goto out; + + err = -EIO; + h_inode = a->dst_h_dentry->d_inode; + isdir = !!au_ftest_ren(a->flags, ISDIR); + if (!a->dst_dentry->d_inode) { + if (unlikely(h_inode)) + goto out; + err = au_may_add(a->dst_dentry, a->btgt, a->dst_h_parent, + isdir); + } else { + if (unlikely(!h_inode || !h_inode->i_nlink)) + goto out; + err = au_may_del(a->dst_dentry, a->btgt, a->dst_h_parent, + isdir); + if (unlikely(err)) + goto out; + err = -ENOTEMPTY; + if (unlikely(a->dst_h_dentry == a->h_trap)) + goto out; + err = 0; + } + + out: + if (unlikely(err == -ENOENT || err == -EEXIST)) + err = -EIO; + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * locking order + * (VFS) + * - src_dir and dir by lock_rename() + * - inode if exitsts + * (aufs) + * - lock all + * + src_dentry and dentry by aufs_read_and_write_lock2() which calls, + * + si_read_lock + * + di_write_lock2_child() + * + di_write_lock_child() + * + ii_write_lock_child() + * + di_write_lock_child2() + * + ii_write_lock_child2() + * + src_parent and parent + * + di_write_lock_parent() + * + ii_write_lock_parent() + * + di_write_lock_parent2() + * + ii_write_lock_parent2() + * + lower src_dir and dir by vfsub_lock_rename() + * + verify the every relationships between child and parent. if any + * of them failed, unlock all and return -EBUSY. + */ +static void au_ren_unlock(struct au_ren_args *a) +{ + struct super_block *sb; + + sb = a->dst_dentry->d_sb; + if (au_ftest_ren(a->flags, MNT_WRITE)) + mnt_drop_write(a->br->br_mnt); + vfsub_unlock_rename(a->src_h_parent, a->src_hdir, + a->dst_h_parent, a->dst_hdir); +} + +static int au_ren_lock(struct au_ren_args *a) +{ + int err; + unsigned int udba; + + err = 0; + a->src_h_parent = au_h_dptr(a->src_parent, a->btgt); + a->src_hdir = au_hi(a->src_dir, a->btgt); + a->dst_h_parent = au_h_dptr(a->dst_parent, a->btgt); + a->dst_hdir = au_hi(a->dst_dir, a->btgt); + a->h_trap = vfsub_lock_rename(a->src_h_parent, a->src_hdir, + a->dst_h_parent, a->dst_hdir); + udba = au_opt_udba(a->src_dentry->d_sb); + if (au_dbstart(a->src_dentry) == a->btgt) + err = au_h_verify(a->src_h_dentry, udba, + a->src_h_parent->d_inode, a->src_h_parent, + a->br); + if (!err && au_dbstart(a->dst_dentry) == a->btgt) + err = au_h_verify(a->dst_h_dentry, udba, + a->dst_h_parent->d_inode, a->dst_h_parent, + a->br); + if (!err) { + err = mnt_want_write(a->br->br_mnt); + if (unlikely(err)) + goto out_unlock; + au_fset_ren(a->flags, MNT_WRITE); + goto out; /* success */ + } + + err = au_busy_or_stale(); + + out_unlock: + au_ren_unlock(a); + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +static void au_ren_refresh_dir(struct au_ren_args *a) +{ + struct inode *dir; + + dir = a->dst_dir; + dir->i_version++; + if (au_ftest_ren(a->flags, ISDIR)) { + /* is this updating defined in POSIX? */ + au_cpup_attr_timesizes(a->src_inode); + au_cpup_attr_nlink(dir, /*force*/1); + if (a->dst_inode) { + clear_nlink(a->dst_inode); + au_cpup_attr_timesizes(a->dst_inode); + } + } + if (au_ibstart(dir) == a->btgt) + au_cpup_attr_timesizes(dir); + + if (au_ftest_ren(a->flags, ISSAMEDIR)) + return; + + dir = a->src_dir; + dir->i_version++; + if (au_ftest_ren(a->flags, ISDIR)) + au_cpup_attr_nlink(dir, /*force*/1); + if (au_ibstart(dir) == a->btgt) + au_cpup_attr_timesizes(dir); +} + +static void au_ren_refresh(struct au_ren_args *a) +{ + aufs_bindex_t bend, bindex; + struct dentry *d, *h_d; + struct inode *i, *h_i; + struct super_block *sb; + + d = a->src_dentry; + au_set_dbwh(d, -1); + bend = au_dbend(d); + for (bindex = a->btgt + 1; bindex <= bend; bindex++) { + h_d = au_h_dptr(d, bindex); + if (h_d) + au_set_h_dptr(d, bindex, NULL); + } + au_set_dbend(d, a->btgt); + + sb = d->d_sb; + i = a->src_inode; + if (au_opt_test(au_mntflags(sb), PLINK) && au_plink_test(i)) + return; /* success */ + + bend = au_ibend(i); + for (bindex = a->btgt + 1; bindex <= bend; bindex++) { + h_i = au_h_iptr(i, bindex); + if (h_i) { + au_xino_write0(sb, bindex, h_i->i_ino, 0); + /* ignore this error */ + au_set_h_iptr(i, bindex, NULL, 0); + } + } + au_set_ibend(i, a->btgt); +} + +/* ---------------------------------------------------------------------- */ + +/* mainly for link(2) and rename(2) */ +int au_wbr(struct dentry *dentry, aufs_bindex_t btgt) +{ + aufs_bindex_t bdiropq, bwh; + struct dentry *parent; + struct au_branch *br; + + parent = dentry->d_parent; + IMustLock(parent->d_inode); /* dir is locked */ + + bdiropq = au_dbdiropq(parent); + bwh = au_dbwh(dentry); + br = au_sbr(dentry->d_sb, btgt); + if (au_br_rdonly(br) + || (0 <= bdiropq && bdiropq < btgt) + || (0 <= bwh && bwh < btgt)) + btgt = -1; + + AuDbg("btgt %d\n", btgt); + return btgt; +} + +/* sets src_bstart, dst_bstart and btgt */ +static int au_ren_wbr(struct au_ren_args *a) +{ + int err; + struct au_wr_dir_args wr_dir_args = { + /* .force_btgt = -1, */ + .flags = AuWrDir_ADD_ENTRY + }; + + a->src_bstart = au_dbstart(a->src_dentry); + a->dst_bstart = au_dbstart(a->dst_dentry); + if (au_ftest_ren(a->flags, ISDIR)) + au_fset_wrdir(wr_dir_args.flags, ISDIR); + wr_dir_args.force_btgt = a->src_bstart; + if (a->dst_inode && a->dst_bstart < a->src_bstart) + wr_dir_args.force_btgt = a->dst_bstart; + wr_dir_args.force_btgt = au_wbr(a->dst_dentry, wr_dir_args.force_btgt); + err = au_wr_dir(a->dst_dentry, a->src_dentry, &wr_dir_args); + a->btgt = err; + + return err; +} + +static void au_ren_dt(struct au_ren_args *a) +{ + a->h_path.dentry = a->src_h_parent; + au_dtime_store(a->src_dt + AuPARENT, a->src_parent, &a->h_path); + if (!au_ftest_ren(a->flags, ISSAMEDIR)) { + a->h_path.dentry = a->dst_h_parent; + au_dtime_store(a->dst_dt + AuPARENT, a->dst_parent, &a->h_path); + } + + au_fclr_ren(a->flags, DT_DSTDIR); + if (!au_ftest_ren(a->flags, ISDIR)) + return; + + a->h_path.dentry = a->src_h_dentry; + au_dtime_store(a->src_dt + AuCHILD, a->src_dentry, &a->h_path); + if (a->dst_h_dentry->d_inode) { + au_fset_ren(a->flags, DT_DSTDIR); + a->h_path.dentry = a->dst_h_dentry; + au_dtime_store(a->dst_dt + AuCHILD, a->dst_dentry, &a->h_path); + } +} + +static void au_ren_rev_dt(int err, struct au_ren_args *a) +{ + struct dentry *h_d; + struct mutex *h_mtx; + + au_dtime_revert(a->src_dt + AuPARENT); + if (!au_ftest_ren(a->flags, ISSAMEDIR)) + au_dtime_revert(a->dst_dt + AuPARENT); + + if (au_ftest_ren(a->flags, ISDIR) && err != -EIO) { + h_d = a->src_dt[AuCHILD].dt_h_path.dentry; + h_mtx = &h_d->d_inode->i_mutex; + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + au_dtime_revert(a->src_dt + AuCHILD); + mutex_unlock(h_mtx); + + if (au_ftest_ren(a->flags, DT_DSTDIR)) { + h_d = a->dst_dt[AuCHILD].dt_h_path.dentry; + h_mtx = &h_d->d_inode->i_mutex; + mutex_lock_nested(h_mtx, AuLsc_I_CHILD); + au_dtime_revert(a->dst_dt + AuCHILD); + mutex_unlock(h_mtx); + } + } +} + +/* ---------------------------------------------------------------------- */ + +int aufs_rename(struct inode *_src_dir, struct dentry *_src_dentry, + struct inode *_dst_dir, struct dentry *_dst_dentry) +{ + int err; + /* reduce stack space */ + struct au_ren_args *a; + + IMustLock(_src_dir); + IMustLock(_dst_dir); + + err = -ENOMEM; + BUILD_BUG_ON(sizeof(*a) > PAGE_SIZE); + a = kzalloc(sizeof(*a), GFP_NOFS); + if (unlikely(!a)) + goto out; + + a->src_dir = _src_dir; + a->src_dentry = _src_dentry; + a->src_inode = a->src_dentry->d_inode; + a->src_parent = a->src_dentry->d_parent; /* dir inode is locked */ + a->dst_dir = _dst_dir; + a->dst_dentry = _dst_dentry; + a->dst_inode = a->dst_dentry->d_inode; + a->dst_parent = a->dst_dentry->d_parent; /* dir inode is locked */ + if (a->dst_inode) { + IMustLock(a->dst_inode); + au_igrab(a->dst_inode); + } + + err = -ENOTDIR; + if (S_ISDIR(a->src_inode->i_mode)) { + au_fset_ren(a->flags, ISDIR); + if (unlikely(a->dst_inode && !S_ISDIR(a->dst_inode->i_mode))) + goto out_free; + aufs_read_and_write_lock2(a->dst_dentry, a->src_dentry, + AuLock_DIR | AuLock_FLUSH); + } else + aufs_read_and_write_lock2(a->dst_dentry, a->src_dentry, + AuLock_FLUSH); + + au_fset_ren(a->flags, ISSAMEDIR); /* temporary */ + di_write_lock_parent(a->dst_parent); + + /* which branch we process */ + err = au_ren_wbr(a); + if (unlikely(err < 0)) + goto out_unlock; + a->br = au_sbr(a->dst_dentry->d_sb, a->btgt); + a->h_path.mnt = a->br->br_mnt; + + /* are they available to be renamed */ + err = au_ren_may_dir(a); + if (unlikely(err)) + goto out_unlock; + + /* prepare the writable parent dir on the same branch */ + if (a->dst_bstart == a->btgt) { + au_fset_ren(a->flags, WHDST); + } else { + err = au_cpup_dirs(a->dst_dentry, a->btgt); + if (unlikely(err)) + goto out_children; + } + + if (a->src_dir != a->dst_dir) { + /* + * this temporary unlock is safe, + * because both dir->i_mutex are locked. + */ + di_write_unlock(a->dst_parent); + di_write_lock_parent(a->src_parent); + err = au_wr_dir_need_wh(a->src_dentry, + au_ftest_ren(a->flags, ISDIR), + &a->btgt); + di_write_unlock(a->src_parent); + di_write_lock2_parent(a->src_parent, a->dst_parent, /*isdir*/1); + au_fclr_ren(a->flags, ISSAMEDIR); + } else + err = au_wr_dir_need_wh(a->src_dentry, + au_ftest_ren(a->flags, ISDIR), + &a->btgt); + if (unlikely(err < 0)) + goto out_children; + if (err) + au_fset_ren(a->flags, WHSRC); + + /* lock them all */ + err = au_ren_lock(a); + if (unlikely(err)) + goto out_children; + + if (!au_opt_test(au_mntflags(a->dst_dir->i_sb), UDBA_NONE)) { + err = au_may_ren(a); + if (unlikely(err)) + goto out_hdir; + } + + /* store timestamps to be revertible */ + au_ren_dt(a); + + /* here we go */ + err = do_rename(a); + if (unlikely(err)) + goto out_dt; + + /* update dir attributes */ + au_ren_refresh_dir(a); + + /* dput/iput all lower dentries */ + au_ren_refresh(a); + + goto out_hdir; /* success */ + + out_dt: + au_ren_rev_dt(err, a); + out_hdir: + au_ren_unlock(a); + out_children: + au_nhash_fin(&a->whlist); + out_unlock: + if (unlikely(err && au_ftest_ren(a->flags, ISDIR))) { + au_update_dbstart(a->dst_dentry); + d_drop(a->dst_dentry); + } + if (!err) { + d_move(a->src_dentry, a->dst_dentry); + if (a->dst_inode + && (a->dst_inode->i_nlink <= 1 + || au_ftest_ren(a->flags, ISDIR))) + a->dst_inode->i_flags |= S_DEAD; + } + if (au_ftest_ren(a->flags, ISSAMEDIR)) + di_write_unlock(a->dst_parent); + else + di_write_unlock2(a->src_parent, a->dst_parent); + aufs_read_and_write_unlock2(a->dst_dentry, a->src_dentry); + out_free: + iput(a->dst_inode); + kfree(a); + out: + return err; +} diff -urN ./fs/aufs/iinfo.c ./fs/aufs/iinfo.c --- ./fs/aufs/iinfo.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/iinfo.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,262 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode private data + */ + +#include "aufs.h" + +struct inode *au_h_iptr(struct inode *inode, aufs_bindex_t bindex) +{ + struct inode *h_inode; + + h_inode = au_ii(inode)->ii_hinode[0 + bindex].hi_inode; + AuDebugOn(h_inode && atomic_read(&h_inode->i_count) <= 0); + return h_inode; +} + +/* todo: hard/soft set? */ +void au_set_ibstart(struct inode *inode, aufs_bindex_t bindex) +{ + struct au_iinfo *iinfo = au_ii(inode); + struct inode *h_inode; + + iinfo->ii_bstart = bindex; + h_inode = iinfo->ii_hinode[bindex + 0].hi_inode; + if (h_inode) + au_cpup_igen(inode, h_inode); +} + +void au_hiput(struct au_hinode *hinode) +{ + au_hin_free(hinode); + dput(hinode->hi_whdentry); + iput(hinode->hi_inode); +} + +unsigned int au_hi_flags(struct inode *inode, int isdir) +{ + unsigned int flags; + const unsigned int mnt_flags = au_mntflags(inode->i_sb); + + flags = 0; + if (au_opt_test(mnt_flags, XINO)) + au_fset_hi(flags, XINO); + if (isdir && au_opt_test(mnt_flags, UDBA_HINOTIFY)) + au_fset_hi(flags, HINOTIFY); + return flags; +} + +void au_set_h_iptr(struct inode *inode, aufs_bindex_t bindex, + struct inode *h_inode, unsigned int flags) +{ + struct au_hinode *hinode; + struct inode *hi; + struct au_iinfo *iinfo = au_ii(inode); + + hinode = iinfo->ii_hinode + bindex; + hi = hinode->hi_inode; + AuDebugOn(h_inode && atomic_read(&h_inode->i_count) <= 0); + AuDebugOn(h_inode && hi); + + if (hi) + au_hiput(hinode); + hinode->hi_inode = h_inode; + if (h_inode) { + int err; + struct super_block *sb = inode->i_sb; + struct au_branch *br; + + if (bindex == iinfo->ii_bstart) + au_cpup_igen(inode, h_inode); + br = au_sbr(sb, bindex); + hinode->hi_id = br->br_id; + if (au_ftest_hi(flags, XINO)) { + err = au_xino_write(sb, bindex, h_inode->i_ino, + inode->i_ino); + if (unlikely(err)) + AuIOErr1("failed au_xino_write() %d\n", err); + } + + if (au_ftest_hi(flags, HINOTIFY) + && au_br_hinotifyable(br->br_perm)) { + err = au_hin_alloc(hinode, inode, h_inode); + if (unlikely(err)) + AuIOErr1("au_hin_alloc() %d\n", err); + } + } +} + +void au_set_hi_wh(struct inode *inode, aufs_bindex_t bindex, + struct dentry *h_wh) +{ + struct au_hinode *hinode; + + hinode = au_ii(inode)->ii_hinode + bindex; + AuDebugOn(hinode->hi_whdentry); + hinode->hi_whdentry = h_wh; +} + +void au_update_iigen(struct inode *inode) +{ + atomic_set(&au_ii(inode)->ii_generation, au_sigen(inode->i_sb)); + /* smp_mb(); */ /* atomic_set */ +} + +/* it may be called at remount time, too */ +void au_update_brange(struct inode *inode, int do_put_zero) +{ + struct au_iinfo *iinfo; + + iinfo = au_ii(inode); + if (!iinfo || iinfo->ii_bstart < 0) + return; + + if (do_put_zero) { + aufs_bindex_t bindex; + + for (bindex = iinfo->ii_bstart; bindex <= iinfo->ii_bend; + bindex++) { + struct inode *h_i; + + h_i = iinfo->ii_hinode[0 + bindex].hi_inode; + if (h_i && !h_i->i_nlink) + au_set_h_iptr(inode, bindex, NULL, 0); + } + } + + iinfo->ii_bstart = -1; + while (++iinfo->ii_bstart <= iinfo->ii_bend) + if (iinfo->ii_hinode[0 + iinfo->ii_bstart].hi_inode) + break; + if (iinfo->ii_bstart > iinfo->ii_bend) { + iinfo->ii_bstart = -1; + iinfo->ii_bend = -1; + return; + } + + iinfo->ii_bend++; + while (0 <= --iinfo->ii_bend) + if (iinfo->ii_hinode[0 + iinfo->ii_bend].hi_inode) + break; + AuDebugOn(iinfo->ii_bstart > iinfo->ii_bend || iinfo->ii_bend < 0); +} + +/* ---------------------------------------------------------------------- */ + +int au_iinfo_init(struct inode *inode) +{ + struct au_iinfo *iinfo; + struct super_block *sb; + int nbr, i; + + sb = inode->i_sb; + iinfo = &(container_of(inode, struct au_icntnr, vfs_inode)->iinfo); + nbr = au_sbend(sb) + 1; + if (unlikely(nbr <= 0)) + nbr = 1; + iinfo->ii_hinode = kcalloc(nbr, sizeof(*iinfo->ii_hinode), GFP_NOFS); + if (iinfo->ii_hinode) { + for (i = 0; i < nbr; i++) + iinfo->ii_hinode[i].hi_id = -1; + + atomic_set(&iinfo->ii_generation, au_sigen(sb)); + /* smp_mb(); */ /* atomic_set */ + init_rwsem(&iinfo->ii_rwsem); + iinfo->ii_bstart = -1; + iinfo->ii_bend = -1; + iinfo->ii_vdir = NULL; + return 0; + } + return -ENOMEM; +} + +int au_ii_realloc(struct au_iinfo *iinfo, int nbr) +{ + int err, sz; + struct au_hinode *hip; + + err = -ENOMEM; + sz = sizeof(*hip) * (iinfo->ii_bend + 1); + if (!sz) + sz = sizeof(*hip); + hip = au_kzrealloc(iinfo->ii_hinode, sz, sizeof(*hip) * nbr, GFP_NOFS); + if (hip) { + iinfo->ii_hinode = hip; + err = 0; + } + + return err; +} + +static int au_iinfo_write0(struct super_block *sb, struct au_hinode *hinode, + ino_t ino) +{ + int err; + aufs_bindex_t bindex; + unsigned char locked; + + err = 0; + locked = !!si_noflush_read_trylock(sb); + bindex = au_br_index(sb, hinode->hi_id); + if (bindex >= 0) + err = au_xino_write0(sb, bindex, hinode->hi_inode->i_ino, ino); + /* error action? */ + if (locked) + si_read_unlock(sb); + return err; +} + +void au_iinfo_fin(struct inode *inode) +{ + ino_t ino; + aufs_bindex_t bend; + unsigned char unlinked = !inode->i_nlink || IS_DEADDIR(inode); + struct au_iinfo *iinfo; + struct au_hinode *hi; + struct super_block *sb; + + if (unlinked) { + int err = au_xigen_inc(inode); + if (unlikely(err)) + AuWarn1("failed resetting i_generation, %d\n", err); + } + + iinfo = au_ii(inode); + /* bad_inode case */ + if (!iinfo) + return; + + if (iinfo->ii_vdir) + au_vdir_free(iinfo->ii_vdir); + + if (iinfo->ii_bstart >= 0) { + sb = inode->i_sb; + ino = 0; + if (unlinked) + ino = inode->i_ino; + hi = iinfo->ii_hinode + iinfo->ii_bstart; + bend = iinfo->ii_bend; + while (iinfo->ii_bstart++ <= bend) { + if (hi->hi_inode) { + if (unlinked || !hi->hi_inode->i_nlink) { + au_iinfo_write0(sb, hi, ino); + /* ignore this error */ + ino = 0; + } + au_hiput(hi); + } + hi++; + } + } + + kfree(iinfo->ii_hinode); + au_rwsem_destroy(&iinfo->ii_rwsem); +} diff -urN ./fs/aufs/inode.c ./fs/aufs/inode.c --- ./fs/aufs/inode.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/inode.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,356 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode functions + */ + +#include "aufs.h" + +static void au_refresh_hinode_attr(struct inode *inode, int do_version) +{ + au_cpup_attr_all(inode, /*force*/0); + au_update_iigen(inode); + if (do_version) + inode->i_version++; +} + +int au_refresh_hinode_self(struct inode *inode, int do_attr) +{ + int err; + aufs_bindex_t bindex, new_bindex; + unsigned char update; + struct inode *first; + struct au_hinode *p, *q, tmp; + struct super_block *sb; + struct au_iinfo *iinfo; + + update = 0; + sb = inode->i_sb; + iinfo = au_ii(inode); + err = au_ii_realloc(iinfo, au_sbend(sb) + 1); + if (unlikely(err)) + goto out; + + p = iinfo->ii_hinode + iinfo->ii_bstart; + first = p->hi_inode; + err = 0; + for (bindex = iinfo->ii_bstart; bindex <= iinfo->ii_bend; + bindex++, p++) { + if (!p->hi_inode) + continue; + + new_bindex = au_br_index(sb, p->hi_id); + if (new_bindex == bindex) + continue; + + if (new_bindex < 0) { + update++; + au_hiput(p); + p->hi_inode = NULL; + continue; + } + + if (new_bindex < iinfo->ii_bstart) + iinfo->ii_bstart = new_bindex; + if (iinfo->ii_bend < new_bindex) + iinfo->ii_bend = new_bindex; + /* swap two lower inode, and loop again */ + q = iinfo->ii_hinode + new_bindex; + tmp = *q; + *q = *p; + *p = tmp; + if (tmp.hi_inode) { + bindex--; + p--; + } + } + au_update_brange(inode, /*do_put_zero*/0); + if (do_attr) + au_refresh_hinode_attr(inode, update && S_ISDIR(inode->i_mode)); + + out: + return err; +} + +int au_refresh_hinode(struct inode *inode, struct dentry *dentry) +{ + int err, update; + unsigned int flags; + aufs_bindex_t bindex, bend; + unsigned char isdir; + struct inode *first; + struct au_hinode *p; + struct au_iinfo *iinfo; + + err = au_refresh_hinode_self(inode, /*do_attr*/0); + if (unlikely(err)) + goto out; + + update = 0; + iinfo = au_ii(inode); + p = iinfo->ii_hinode + iinfo->ii_bstart; + first = p->hi_inode; + isdir = S_ISDIR(inode->i_mode); + flags = au_hi_flags(inode, isdir); + bend = au_dbend(dentry); + for (bindex = au_dbstart(dentry); bindex <= bend; bindex++) { + struct inode *h_i; + struct dentry *h_d; + + h_d = au_h_dptr(dentry, bindex); + if (!h_d || !h_d->d_inode) + continue; + + if (iinfo->ii_bstart <= bindex && bindex <= iinfo->ii_bend) { + h_i = au_h_iptr(inode, bindex); + if (h_i) { + if (h_i == h_d->d_inode) + continue; + err = -EIO; + break; + } + } + if (bindex < iinfo->ii_bstart) + iinfo->ii_bstart = bindex; + if (iinfo->ii_bend < bindex) + iinfo->ii_bend = bindex; + au_set_h_iptr(inode, bindex, au_igrab(h_d->d_inode), flags); + update = 1; + } + au_update_brange(inode, /*do_put_zero*/0); + + if (unlikely(err)) + goto out; + + au_refresh_hinode_attr(inode, update && isdir); + + out: + return err; +} + +static int set_inode(struct inode *inode, struct dentry *dentry) +{ + int err; + unsigned int flags; + umode_t mode; + aufs_bindex_t bindex, bstart, btail; + unsigned char isdir; + struct dentry *h_dentry; + struct inode *h_inode; + struct au_iinfo *iinfo; + + err = 0; + isdir = 0; + bstart = au_dbstart(dentry); + h_inode = au_h_dptr(dentry, bstart)->d_inode; + mode = h_inode->i_mode; + switch (mode & S_IFMT) { + case S_IFREG: + btail = au_dbtail(dentry); + inode->i_op = &aufs_iop; + inode->i_fop = &aufs_file_fop; + inode->i_mapping->a_ops = &aufs_aop; + break; + case S_IFDIR: + isdir = 1; + btail = au_dbtaildir(dentry); + inode->i_op = &aufs_dir_iop; + inode->i_fop = &aufs_dir_fop; + break; + case S_IFLNK: + btail = au_dbtail(dentry); + inode->i_op = &aufs_symlink_iop; + break; + case S_IFBLK: + case S_IFCHR: + case S_IFIFO: + case S_IFSOCK: + btail = au_dbtail(dentry); + inode->i_op = &aufs_iop; + init_special_inode(inode, mode, h_inode->i_rdev); + break; + default: + AuIOErr("Unknown file type 0%o\n", mode); + err = -EIO; + goto out; + } + + flags = au_hi_flags(inode, isdir); + iinfo = au_ii(inode); + iinfo->ii_bstart = bstart; + iinfo->ii_bend = btail; + for (bindex = bstart; bindex <= btail; bindex++) { + h_dentry = au_h_dptr(dentry, bindex); + if (h_dentry) + au_set_h_iptr(inode, bindex, + au_igrab(h_dentry->d_inode), flags); + } + au_cpup_attr_all(inode, /*force*/1); + + out: + return err; +} + +/* successful returns with iinfo write_locked */ +static int reval_inode(struct inode *inode, struct dentry *dentry, int *matched) +{ + int err; + aufs_bindex_t bindex, bend; + struct inode *h_inode, *h_dinode; + + *matched = 0; + + /* + * before this function, if aufs got any iinfo lock, it must be only + * one, the parent dir. + * it can happen by UDBA and the obsoleted inode number. + */ + err = -EIO; + if (unlikely(inode->i_ino == parent_ino(dentry))) + goto out; + + ii_write_lock_new_child(inode); + if (unlikely(IS_DEADDIR(inode))) + goto out_unlock; + + err = 0; + h_dinode = au_h_dptr(dentry, au_dbstart(dentry))->d_inode; + bend = au_ibend(inode); + for (bindex = au_ibstart(inode); bindex <= bend; bindex++) { + h_inode = au_h_iptr(inode, bindex); + if (h_inode && h_inode == h_dinode) { + *matched = 1; + err = 0; + if (au_iigen(inode) != au_digen(dentry)) + err = au_refresh_hinode(inode, dentry); + break; + } + } + + out_unlock: + if (unlikely(err)) + ii_write_unlock(inode); + out: + return err; +} + +/* successful returns with iinfo write_locked */ +/* todo: return with unlocked? */ +struct inode *au_new_inode(struct dentry *dentry, int must_new) +{ + struct inode *inode; + struct dentry *h_dentry; + struct super_block *sb; + ino_t h_ino, ino; + int err, match; + aufs_bindex_t bstart; + + sb = dentry->d_sb; + bstart = au_dbstart(dentry); + h_dentry = au_h_dptr(dentry, bstart); + h_ino = h_dentry->d_inode->i_ino; + err = au_xino_read(sb, bstart, h_ino, &ino); + inode = ERR_PTR(err); + if (unlikely(err)) + goto out; + new_ino: + if (!ino) { + ino = au_xino_new_ino(sb); + if (unlikely(!ino)) { + inode = ERR_PTR(-EIO); + goto out; + } + } + + AuDbg("i%lu\n", (unsigned long)ino); + inode = au_iget_locked(sb, ino); + err = PTR_ERR(inode); + if (IS_ERR(inode)) + goto out; + + AuDbg("%lx, new %d\n", inode->i_state, !!(inode->i_state & I_NEW)); + if (inode->i_state & I_NEW) { + ii_write_lock_new_child(inode); + err = set_inode(inode, dentry); + unlock_new_inode(inode); + if (!err) + goto out; /* success */ + + iget_failed(inode); + ii_write_unlock(inode); + goto out_iput; + } else if (!must_new) { + err = reval_inode(inode, dentry, &match); + if (!err) + goto out; /* success */ + else if (match) + goto out_iput; + } + + if (unlikely(au_test_fs_unique_ino(h_dentry->d_inode))) + AuWarn1("Un-notified UDBA or repeatedly renamed dir," + " b%d, %s, %.*s, hi%lu, i%lu.\n", + bstart, au_sbtype(h_dentry->d_sb), AuDLNPair(dentry), + (unsigned long)h_ino, (unsigned long)ino); + ino = 0; + err = au_xino_write0(sb, bstart, h_ino, 0); + if (!err) { + iput(inode); + goto new_ino; + } + + out_iput: + iput(inode); + inode = ERR_PTR(err); + out: + return inode; +} + +/* ---------------------------------------------------------------------- */ + +int au_test_ro(struct super_block *sb, aufs_bindex_t bindex, + struct inode *inode) +{ + int err; + + err = au_br_rdonly(au_sbr(sb, bindex)); + + /* pseudo-link after flushed may happen out of bounds */ + if (!err + && inode + && au_ibstart(inode) <= bindex + && bindex <= au_ibend(inode)) { + /* + * permission check is unnecessary since vfsub routine + * will be called later + */ + struct inode *hi = au_h_iptr(inode, bindex); + if (hi) + err = IS_IMMUTABLE(hi) ? -EROFS : 0; + } + + return err; +} + +int au_test_h_perm(struct inode *h_inode, int mask) +{ + if (!current_fsuid()) + return 0; + return inode_permission(h_inode, mask); +} + +int au_test_h_perm_sio(struct inode *h_inode, int mask) +{ + if (au_test_nfs(h_inode->i_sb) + && (mask & MAY_WRITE) + && S_ISDIR(h_inode->i_mode)) + mask |= MAY_READ; /* force permission check */ + return au_test_h_perm(h_inode, mask); +} diff -urN ./fs/aufs/inode.h ./fs/aufs/inode.h --- ./fs/aufs/inode.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/inode.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,471 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * inode operations + */ + +#ifndef __AUFS_INODE_H__ +#define __AUFS_INODE_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include "rwsem.h" + +struct au_hinotify { +#ifdef CONFIG_AUFS_HINOTIFY + struct inotify_watch hin_watch; + struct inode *hin_aufs_inode; /* no get/put */ +#endif +}; + +struct au_hinode { + struct inode *hi_inode; + aufs_bindex_t hi_id; +#ifdef CONFIG_AUFS_HINOTIFY + struct au_hinotify *hi_notify; +#endif + + /* reference to the copied-up whiteout with get/put */ + struct dentry *hi_whdentry; +}; + +struct au_vdir; +struct au_iinfo { + atomic_t ii_generation; + struct super_block *ii_hsb1; /* no get/put */ + + struct rw_semaphore ii_rwsem; + aufs_bindex_t ii_bstart, ii_bend; + __u32 ii_higen; + struct au_hinode *ii_hinode; + struct au_vdir *ii_vdir; +}; + +struct au_icntnr { + struct au_iinfo iinfo; + struct inode vfs_inode; +}; + +/* au_pin flags */ +#define AuPin_DI_LOCKED 1 +#define AuPin_MNT_WRITE (1 << 1) +#define au_ftest_pin(flags, name) ((flags) & AuPin_##name) +#define au_fset_pin(flags, name) { (flags) |= AuPin_##name; } +#define au_fclr_pin(flags, name) { (flags) &= ~AuPin_##name; } + +struct au_pin { + /* input */ + struct dentry *dentry; + unsigned int udba; + unsigned char lsc_di, lsc_hi, flags; + aufs_bindex_t bindex; + + /* output */ + struct dentry *parent; + struct au_hinode *hdir; + struct vfsmount *h_mnt; +}; + +/* ---------------------------------------------------------------------- */ + +static inline struct au_iinfo *au_ii(struct inode *inode) +{ + struct au_iinfo *iinfo; + + iinfo = &(container_of(inode, struct au_icntnr, vfs_inode)->iinfo); + if (iinfo->ii_hinode) + return iinfo; + return NULL; /* debugging bad_inode case */ +} + +/* ---------------------------------------------------------------------- */ + +/* inode.c */ +int au_refresh_hinode_self(struct inode *inode, int do_attr); +int au_refresh_hinode(struct inode *inode, struct dentry *dentry); +struct inode *au_new_inode(struct dentry *dentry, int must_new); +int au_test_ro(struct super_block *sb, aufs_bindex_t bindex, + struct inode *inode); +int au_test_h_perm(struct inode *h_inode, int mask); +int au_test_h_perm_sio(struct inode *h_inode, int mask); + +/* i_op.c */ +extern struct inode_operations aufs_iop, aufs_symlink_iop, aufs_dir_iop; + +/* au_wr_dir flags */ +#define AuWrDir_ADD_ENTRY 1 +#define AuWrDir_ISDIR (1 << 1) +#define au_ftest_wrdir(flags, name) ((flags) & AuWrDir_##name) +#define au_fset_wrdir(flags, name) { (flags) |= AuWrDir_##name; } +#define au_fclr_wrdir(flags, name) { (flags) &= ~AuWrDir_##name; } + +struct au_wr_dir_args { + aufs_bindex_t force_btgt; + unsigned char flags; +}; +int au_wr_dir(struct dentry *dentry, struct dentry *src_dentry, + struct au_wr_dir_args *args); + +struct dentry *au_pinned_h_parent(struct au_pin *pin); +void au_pin_init(struct au_pin *pin, struct dentry *dentry, + aufs_bindex_t bindex, int lsc_di, int lsc_hi, + unsigned int udba, unsigned char flags); +int au_pin(struct au_pin *pin, struct dentry *dentry, aufs_bindex_t bindex, + unsigned int udba, unsigned char flags) __must_check; +int au_do_pin(struct au_pin *pin) __must_check; +void au_unpin(struct au_pin *pin); + +/* i_op_add.c */ +int au_may_add(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent, int isdir); +int aufs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev); +int aufs_symlink(struct inode *dir, struct dentry *dentry, const char *symname); +int aufs_create(struct inode *dir, struct dentry *dentry, int mode, + struct nameidata *nd); +int aufs_link(struct dentry *src_dentry, struct inode *dir, + struct dentry *dentry); +int aufs_mkdir(struct inode *dir, struct dentry *dentry, int mode); + +/* i_op_del.c */ +int au_wr_dir_need_wh(struct dentry *dentry, int isdir, aufs_bindex_t *bcpup); +int au_may_del(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent, int isdir); +int aufs_unlink(struct inode *dir, struct dentry *dentry); +int aufs_rmdir(struct inode *dir, struct dentry *dentry); + +/* i_op_ren.c */ +int au_wbr(struct dentry *dentry, aufs_bindex_t btgt); +int aufs_rename(struct inode *src_dir, struct dentry *src_dentry, + struct inode *dir, struct dentry *dentry); + +/* iinfo.c */ +struct inode *au_h_iptr(struct inode *inode, aufs_bindex_t bindex); +void au_hiput(struct au_hinode *hinode); +void au_set_ibstart(struct inode *inode, aufs_bindex_t bindex); +void au_set_hi_wh(struct inode *inode, aufs_bindex_t bindex, + struct dentry *h_wh); +unsigned int au_hi_flags(struct inode *inode, int isdir); + +/* hinode flags */ +#define AuHi_XINO 1 +#define AuHi_HINOTIFY (1 << 1) +#define au_ftest_hi(flags, name) ((flags) & AuHi_##name) +#define au_fset_hi(flags, name) { (flags) |= AuHi_##name; } +#define au_fclr_hi(flags, name) { (flags) &= ~AuHi_##name; } + +#ifndef CONFIG_AUFS_HINOTIFY +#undef AuHi_HINOTIFY +#define AuHi_HINOTIFY 0 +#endif + +void au_set_h_iptr(struct inode *inode, aufs_bindex_t bindex, + struct inode *h_inode, unsigned int flags); + +void au_update_iigen(struct inode *inode); +void au_update_brange(struct inode *inode, int do_put_zero); + +int au_iinfo_init(struct inode *inode); +void au_iinfo_fin(struct inode *inode); +int au_ii_realloc(struct au_iinfo *iinfo, int nbr); + +/* plink.c */ +void au_plink_block_maintain(struct super_block *sb); +#ifdef CONFIG_AUFS_DEBUG +void au_plink_list(struct super_block *sb); +#else +static inline void au_plink_list(struct super_block *sb) +{ + /* nothing */ +} +#endif +int au_plink_test(struct inode *inode); +struct dentry *au_plink_lkup(struct inode *inode, aufs_bindex_t bindex); +void au_plink_append(struct inode *inode, aufs_bindex_t bindex, + struct dentry *h_dentry); +void au_plink_put(struct super_block *sb); +void au_plink_half_refresh(struct super_block *sb, aufs_bindex_t br_id); + +/* ---------------------------------------------------------------------- */ + +/* lock subclass for iinfo */ +enum { + AuLsc_II_CHILD, /* child first */ + AuLsc_II_CHILD2, /* rename(2), link(2), and cpup at hinotify */ + AuLsc_II_CHILD3, /* copyup dirs */ + AuLsc_II_PARENT, /* see AuLsc_I_PARENT in vfsub.h */ + AuLsc_II_PARENT2, + AuLsc_II_PARENT3, /* copyup dirs */ + AuLsc_II_NEW_CHILD +}; + +/* + * ii_read_lock_child, ii_write_lock_child, + * ii_read_lock_child2, ii_write_lock_child2, + * ii_read_lock_child3, ii_write_lock_child3, + * ii_read_lock_parent, ii_write_lock_parent, + * ii_read_lock_parent2, ii_write_lock_parent2, + * ii_read_lock_parent3, ii_write_lock_parent3, + * ii_read_lock_new_child, ii_write_lock_new_child, + */ +#define AuReadLockFunc(name, lsc) \ +static inline void ii_read_lock_##name(struct inode *i) \ +{ \ + down_read_nested(&au_ii(i)->ii_rwsem, AuLsc_II_##lsc); \ +} + +#define AuWriteLockFunc(name, lsc) \ +static inline void ii_write_lock_##name(struct inode *i) \ +{ \ + down_write_nested(&au_ii(i)->ii_rwsem, AuLsc_II_##lsc); \ +} + +#define AuRWLockFuncs(name, lsc) \ + AuReadLockFunc(name, lsc) \ + AuWriteLockFunc(name, lsc) + +AuRWLockFuncs(child, CHILD); +AuRWLockFuncs(child2, CHILD2); +AuRWLockFuncs(child3, CHILD3); +AuRWLockFuncs(parent, PARENT); +AuRWLockFuncs(parent2, PARENT2); +AuRWLockFuncs(parent3, PARENT3); +AuRWLockFuncs(new_child, NEW_CHILD); + +#undef AuReadLockFunc +#undef AuWriteLockFunc +#undef AuRWLockFuncs + +/* + * ii_read_unlock, ii_write_unlock, ii_downgrade_lock + */ +AuSimpleUnlockRwsemFuncs(ii, struct inode *i, &au_ii(i)->ii_rwsem); + +#define IiMustNoWaiters(i) AuRwMustNoWaiters(&au_ii(i)->ii_rwsem) + +/* ---------------------------------------------------------------------- */ + +static inline unsigned int au_iigen(struct inode *inode) +{ + return atomic_read(&au_ii(inode)->ii_generation); +} + +/* tiny test for inode number */ +/* tmpfs generation is too rough */ +static inline int au_test_higen(struct inode *inode, struct inode *h_inode) +{ + struct au_iinfo *iinfo; + + iinfo = au_ii(inode); + return !(iinfo->ii_hsb1 == h_inode->i_sb + && iinfo->ii_higen == h_inode->i_generation); +} + +static inline struct inode *au_igrab(struct inode *inode) +{ + if (inode) { + AuDebugOn(!atomic_read(&inode->i_count)); + atomic_inc(&inode->i_count); + } + return inode; +} + +/* ---------------------------------------------------------------------- */ + +static inline aufs_bindex_t au_ii_br_id(struct inode *inode, + aufs_bindex_t bindex) +{ + return au_ii(inode)->ii_hinode[0 + bindex].hi_id; +} + +static inline aufs_bindex_t au_ibstart(struct inode *inode) +{ + return au_ii(inode)->ii_bstart; +} + +static inline aufs_bindex_t au_ibend(struct inode *inode) +{ + return au_ii(inode)->ii_bend; +} + +static inline struct au_vdir *au_ivdir(struct inode *inode) +{ + return au_ii(inode)->ii_vdir; +} + +static inline struct dentry *au_hi_wh(struct inode *inode, aufs_bindex_t bindex) +{ + return au_ii(inode)->ii_hinode[0 + bindex].hi_whdentry; +} + +static inline void au_set_ibend(struct inode *inode, aufs_bindex_t bindex) +{ + au_ii(inode)->ii_bend = bindex; +} + +static inline void au_set_ivdir(struct inode *inode, struct au_vdir *vdir) +{ + au_ii(inode)->ii_vdir = vdir; +} + +static inline struct au_hinode *au_hi(struct inode *inode, aufs_bindex_t bindex) +{ + return au_ii(inode)->ii_hinode + bindex; +} + +/* ---------------------------------------------------------------------- */ + +static inline struct dentry *au_pinned_parent(struct au_pin *pin) +{ + if (pin) + return pin->parent; + return NULL; +} + +static inline struct inode *au_pinned_h_dir(struct au_pin *pin) +{ + if (pin && pin->hdir) + return pin->hdir->hi_inode; + return NULL; +} + +static inline struct au_hinode *au_pinned_hdir(struct au_pin *pin) +{ + if (pin) + return pin->hdir; + return NULL; +} + +static inline void au_pin_set_dentry(struct au_pin *pin, struct dentry *dentry) +{ + if (pin) + pin->dentry = dentry; +} + +static inline void au_pin_set_parent_lflag(struct au_pin *pin, + unsigned char lflag) +{ + if (pin) { + /* dirty macros require brackets */ + if (lflag) { + au_fset_pin(pin->flags, DI_LOCKED); + } else { + au_fclr_pin(pin->flags, DI_LOCKED); + } + } +} + +static inline void au_pin_set_parent(struct au_pin *pin, struct dentry *parent) +{ + if (pin) { + dput(pin->parent); + pin->parent = dget(parent); + } +} + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_HINOTIFY +/* hinotify.c */ +int au_hin_alloc(struct au_hinode *hinode, struct inode *inode, + struct inode *h_inode); +void au_hin_free(struct au_hinode *hinode); +void au_hin_ctl(struct au_hinode *hinode, int do_set); +void au_reset_hinotify(struct inode *inode, unsigned int flags); + +int __init au_hinotify_init(void); +void au_hinotify_fin(void); + +static inline +void au_hin_init(struct au_hinode *hinode, struct au_hinotify *val) +{ + hinode->hi_notify = val; +} + +static inline void au_iigen_dec(struct inode *inode) +{ + atomic_dec(&au_ii(inode)->ii_generation); +} + +#else +static inline +int au_hin_alloc(struct au_hinode *hinode __maybe_unused, + struct inode *inode __maybe_unused, + struct inode *h_inode __maybe_unused) +{ + return -EOPNOTSUPP; +} + +static inline void au_hin_free(struct au_hinode *hinode __maybe_unused) +{ + /* nothing */ +} + +static inline void au_hin_ctl(struct au_hinode *hinode __maybe_unused, + int do_set __maybe_unused) +{ + /* nothing */ +} + +static inline void au_reset_hinotify(struct inode *inode __maybe_unused, + unsigned int flags __maybe_unused) +{ + /* nothing */ +} + +static inline int au_hinotify_init(void) +{ + return 0; +} + +#define au_hinotify_fin() do {} while (0) + +static inline +void au_hin_init(struct au_hinode *hinode __maybe_unused, + struct au_hinotify *val __maybe_unused) +{ + /* empty */ +} +#endif /* CONFIG_AUFS_HINOTIFY */ + +static inline void au_hin_suspend(struct au_hinode *hdir) +{ + au_hin_ctl(hdir, /*do_set*/0); +} + +static inline void au_hin_resume(struct au_hinode *hdir) +{ + au_hin_ctl(hdir, /*do_set*/1); +} + +static inline void au_hin_imtx_lock(struct au_hinode *hdir) +{ + mutex_lock(&hdir->hi_inode->i_mutex); + au_hin_suspend(hdir); +} + +static inline void au_hin_imtx_lock_nested(struct au_hinode *hdir, + unsigned int sc __maybe_unused) +{ + mutex_lock_nested(&hdir->hi_inode->i_mutex, sc); + au_hin_suspend(hdir); +} + +static inline void au_hin_imtx_unlock(struct au_hinode *hdir) +{ + au_hin_resume(hdir); + mutex_unlock(&hdir->hi_inode->i_mutex); +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_INODE_H__ */ diff -urN ./fs/aufs/ioctl.c ./fs/aufs/ioctl.c --- ./fs/aufs/ioctl.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/ioctl.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,54 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include "aufs.h" + +long aufs_ioctl_dir(struct file *file, unsigned int cmd, + unsigned long arg __maybe_unused) +{ + long err; + struct super_block *sb; + struct au_sbinfo *sbinfo; + + err = -EACCES; + if (!capable(CAP_SYS_ADMIN)) + goto out; + + err = 0; + sb = file->f_dentry->d_sb; + sbinfo = au_sbi(sb); + switch (cmd) { + case AUFS_CTL_PLINK_MAINT: + /* + * pseudo-link maintenance mode, + * cleared by aufs_release_dir() + */ + si_write_lock(sb); + if (!au_ftest_si(sbinfo, MAINTAIN_PLINK)) { + au_fset_si(sbinfo, MAINTAIN_PLINK); + au_fi(file)->fi_maintain_plink = 1; + } else + err = -EBUSY; + si_write_unlock(sb); + break; + case AUFS_CTL_PLINK_CLEAN: + if (au_opt_test(sbinfo->si_mntflags, PLINK)) { + aufs_write_lock(sb->s_root); + au_plink_put(sb); + aufs_write_unlock(sb->s_root); + } + break; + default: + err = -EINVAL; + } + + out: + return err; +} diff -urN ./fs/aufs/loop.c ./fs/aufs/loop.c --- ./fs/aufs/loop.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/loop.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,46 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * support for loopback block device as a branch + */ + +#include +#include "aufs.h" + +/* + * test if two lower dentries have overlapping branches. + */ +int au_test_loopback_overlap(struct super_block *sb, struct dentry *h_d1, + struct dentry *h_d2) +{ + struct inode *h_inode; + struct loop_device *l; + + h_inode = h_d1->d_inode; + if (MAJOR(h_inode->i_sb->s_dev) != LOOP_MAJOR) + return 0; + + l = h_inode->i_sb->s_bdev->bd_disk->private_data; + h_d1 = l->lo_backing_file->f_dentry; + /* h_d1 can be local NFS. in this case aufs cannot detect the loop */ + if (unlikely(h_d1->d_sb == sb)) + return 1; + return !!au_test_subdir(h_d1, h_d2); +} + +/* true if a kernel thread named 'loop[0-9].*' accesses a file */ +int au_test_loopback_kthread(void) +{ + const char c = current->comm[4]; + + return current->mm == NULL + && '0' <= c && c <= '9' + && strncmp(current->comm, "loop", 4) == 0; +} diff -urN ./fs/aufs/loop.h ./fs/aufs/loop.h --- ./fs/aufs/loop.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/loop.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,41 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * support for loopback mount as a branch + */ + +#ifndef __AUFS_LOOP_H__ +#define __AUFS_LOOP_H__ + +#ifdef __KERNEL__ + +#include + +#ifdef CONFIG_AUFS_BDEV_LOOP +/* loop.c */ +int au_test_loopback_overlap(struct super_block *sb, struct dentry *h_d1, + struct dentry *h_d2); +int au_test_loopback_kthread(void); +#else +static inline +int au_test_loopback_overlap(struct super_block *sb, struct dentry *h_d1, + struct dentry *h_d2) +{ + return 0; +} + +static inline int au_test_loopback_kthread(void) +{ + return 0; +} +#endif /* BLK_DEV_LOOP */ + +#endif /* __KERNEL__ */ +#endif /* __AUFS_LOOP_H__ */ diff -urN ./fs/aufs/magic.mk ./fs/aufs/magic.mk --- ./fs/aufs/magic.mk 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/magic.mk 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,58 @@ + +# defined in ${srctree}/fs/fuse/inode.c +# tristate +ifdef CONFIG_FUSE_FS +ccflags-y += -DFUSE_SUPER_MAGIC=0x65735546 +endif + +# defined in ${srctree}/fs/ocfs2/ocfs2_fs.h +# tristate +ifdef CONFIG_OCFS2_FS +ccflags-y += -DOCFS2_SUPER_MAGIC=0x7461636f +endif + +# defined in ${srctree}/fs/ocfs2/dlm/userdlm.h +# tristate +ifdef CONFIG_OCFS2_FS_O2CB +ccflags-y += -DDLMFS_MAGIC=0x76a9f425 +endif + +# defined in ${srctree}/fs/ramfs/inode.c +# always true +ccflags-y += -DRAMFS_MAGIC=0x858458f6 + +# defined in ${srctree}/fs/cifs/cifsfs.c +# tristate +ifdef CONFIG_CIFS_FS +ccflags-y += -DCIFS_MAGIC_NUMBER=0xFF534D42 +endif + +# defined in ${srctree}/fs/xfs/xfs_sb.h +# tristate +ifdef CONFIG_XFS_FS +ccflags-y += -DXFS_SB_MAGIC=0x58465342 +endif + +# defined in ${srctree}/fs/configfs/mount.c +# tristate +ifdef CONFIG_CONFIGFS_FS +ccflags-y += -DCONFIGFS_MAGIC=0x62656570 +endif + +# defined in ${srctree}/fs/9p/v9fs.h +# tristate +ifdef CONFIG_9P_FS +ccflags-y += -DV9FS_MAGIC=0x01021997 +endif + +# defined in ${srctree}/fs/ubifs/ubifs.h +# tristate +ifdef CONFIG_UBIFS_FS +ccflags-y += -DUBIFS_SUPER_MAGIC=0x24051905 +endif + +# defined in ${srctree}/fs/debugfs/inode.c +# boolean +ifdef CONFIG_DEBUG_FS +ccflags-y += -DDEBUGFS_MAGIC=0x64626720 +endif diff -urN ./fs/aufs/module.c ./fs/aufs/module.c --- ./fs/aufs/module.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/module.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,164 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * module global variables and operations + */ + +#include +#include +#include "aufs.h" + +void *au_kzrealloc(void *p, unsigned int nused, unsigned int new_sz, gfp_t gfp) +{ + if (new_sz <= nused) + return p; + + p = krealloc(p, new_sz, gfp); + if (p) + memset(p + nused, 0, new_sz - nused); + return p; +} + +/* ---------------------------------------------------------------------- */ + +/* + * aufs caches + */ +struct kmem_cache *au_cachep[AuCache_Last]; +static int __init au_cache_init(void) +{ + au_cachep[AuCache_DINFO] = AuCache(au_dinfo); + if (au_cachep[AuCache_DINFO]) + au_cachep[AuCache_ICNTNR] = AuCache(au_icntnr); + if (au_cachep[AuCache_ICNTNR]) + au_cachep[AuCache_FINFO] = AuCache(au_finfo); + if (au_cachep[AuCache_FINFO]) + au_cachep[AuCache_VDIR] = AuCache(au_vdir); + if (au_cachep[AuCache_VDIR]) + au_cachep[AuCache_DEHSTR] = AuCache(au_vdir_dehstr); + if (au_cachep[AuCache_DEHSTR]) + return 0; + + return -ENOMEM; +} + +static void au_cache_fin(void) +{ + int i; + for (i = 0; i < AuCache_Last; i++) + if (au_cachep[i]) { + kmem_cache_destroy(au_cachep[i]); + au_cachep[i] = NULL; + } +} + +/* ---------------------------------------------------------------------- */ + +int au_dir_roflags; + +/* + * functions for module interface. + */ +MODULE_LICENSE("GPL"); +/* MODULE_LICENSE("GPL v2"); */ +MODULE_AUTHOR("Junjiro R. Okajima"); +MODULE_DESCRIPTION(AUFS_NAME + " -- Advanced multi layered unification filesystem"); +MODULE_VERSION(AUFS_VERSION); + +/* it should be 'byte', but param_set_byte() prints it by "%c" */ +short aufs_nwkq = AUFS_NWKQ_DEF; +MODULE_PARM_DESC(nwkq, "the number of workqueue thread, " AUFS_WKQ_NAME); +module_param_named(nwkq, aufs_nwkq, short, S_IRUGO); + +/* this module parameter has no meaning when SYSFS is disabled */ +int sysaufs_brs = 1; +MODULE_PARM_DESC(brs, "use /fs/aufs/si_*/brN"); +module_param_named(brs, sysaufs_brs, int, S_IRUGO); + +/* ---------------------------------------------------------------------- */ + +static char au_esc_chars[0x20 + 3]; /* 0x01-0x20, backslash, del, and NULL */ + +int au_seq_path(struct seq_file *seq, struct path *path) +{ + return seq_path(seq, path, au_esc_chars); +} + +/* ---------------------------------------------------------------------- */ + +static int __init aufs_init(void) +{ + int err, i; + char *p; + + p = au_esc_chars; + for (i = 1; i <= ' '; i++) + *p++ = i; + *p++ = '\\'; + *p++ = '\x7f'; + *p = 0; + + au_dir_roflags = au_file_roflags(O_DIRECTORY | O_LARGEFILE); + + sysaufs_brs_init(); + au_debug_init(); + + err = -EINVAL; + if (unlikely(aufs_nwkq <= 0)) + goto out; + + err = sysaufs_init(); + if (unlikely(err)) + goto out; + err = au_wkq_init(); + if (unlikely(err)) + goto out_sysaufs; + err = au_hinotify_init(); + if (unlikely(err)) + goto out_wkq; + err = au_sysrq_init(); + if (unlikely(err)) + goto out_hin; + err = au_cache_init(); + if (unlikely(err)) + goto out_sysrq; + err = register_filesystem(&aufs_fs_type); + if (unlikely(err)) + goto out_cache; + pr_info(AUFS_NAME " " AUFS_VERSION "\n"); + goto out; /* success */ + + out_cache: + au_cache_fin(); + out_sysrq: + au_sysrq_fin(); + out_hin: + au_hinotify_fin(); + out_wkq: + au_wkq_fin(); + out_sysaufs: + sysaufs_fin(); + out: + return err; +} + +static void __exit aufs_exit(void) +{ + unregister_filesystem(&aufs_fs_type); + au_cache_fin(); + au_sysrq_fin(); + au_hinotify_fin(); + au_wkq_fin(); + sysaufs_fin(); +} + +module_init(aufs_init); +module_exit(aufs_exit); diff -urN ./fs/aufs/module.h ./fs/aufs/module.h --- ./fs/aufs/module.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/module.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,66 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * module initialization and module-global + */ + +#ifndef __AUFS_MODULE_H__ +#define __AUFS_MODULE_H__ + +#ifdef __KERNEL__ + +#include + +/* module parameters */ +extern short aufs_nwkq; +extern int sysaufs_brs; + +/* ---------------------------------------------------------------------- */ + +extern int au_dir_roflags; + +void *au_kzrealloc(void *p, unsigned int nused, unsigned int new_sz, gfp_t gfp); +int au_seq_path(struct seq_file *seq, struct path *path); + +/* ---------------------------------------------------------------------- */ + +/* kmem cache */ +enum { + AuCache_DINFO, + AuCache_ICNTNR, + AuCache_FINFO, + AuCache_VDIR, + AuCache_DEHSTR, +#ifdef CONFIG_AUFS_HINOTIFY + AuCache_HINOTIFY, +#endif + AuCache_Last +}; + +#define AuCache(type) KMEM_CACHE(type, SLAB_RECLAIM_ACCOUNT) + +extern struct kmem_cache *au_cachep[]; + +#define AuCacheFuncs(name, index) \ +static inline void *au_cache_alloc_##name(void) \ +{ return kmem_cache_alloc(au_cachep[AuCache_##index], GFP_NOFS); } \ +static inline void au_cache_free_##name(void *p) \ +{ kmem_cache_free(au_cachep[AuCache_##index], p); } + +AuCacheFuncs(dinfo, DINFO); +AuCacheFuncs(icntnr, ICNTNR); +AuCacheFuncs(finfo, FINFO); +AuCacheFuncs(vdir, VDIR); +AuCacheFuncs(dehstr, DEHSTR); + +/* ---------------------------------------------------------------------- */ + +#endif /* __KERNEL__ */ +#endif /* __AUFS_MODULE_H__ */ diff -urN ./fs/aufs/opts.c ./fs/aufs/opts.c --- ./fs/aufs/opts.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/opts.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,1438 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * mount options/flags + */ + +#include /* a distribution requires */ +#include +#include "aufs.h" + +/* ---------------------------------------------------------------------- */ + +enum { + Opt_br, + Opt_add, Opt_del, Opt_mod, Opt_reorder, Opt_append, Opt_prepend, + Opt_idel, Opt_imod, Opt_ireorder, + Opt_dirwh, Opt_rdcache, Opt_deblk, Opt_nhash, Opt_rendir, + Opt_xino, Opt_zxino, Opt_noxino, + Opt_trunc_xino, Opt_trunc_xino_v, Opt_notrunc_xino, + Opt_trunc_xino_path, Opt_itrunc_xino, + Opt_trunc_xib, Opt_notrunc_xib, + Opt_plink, Opt_noplink, Opt_list_plink, + Opt_udba, + /* Opt_lock, Opt_unlock, */ + Opt_cmd, Opt_cmd_args, + Opt_diropq_a, Opt_diropq_w, + Opt_warn_perm, Opt_nowarn_perm, + Opt_wbr_copyup, Opt_wbr_create, + Opt_refrof, Opt_norefrof, + Opt_verbose, Opt_noverbose, + Opt_sum, Opt_nosum, Opt_wsum, + Opt_tail, Opt_ignore, Opt_ignore_silent, Opt_err +}; + +static match_table_t options = { + {Opt_br, "br=%s"}, + {Opt_br, "br:%s"}, + + {Opt_add, "add=%d:%s"}, + {Opt_add, "add:%d:%s"}, + {Opt_add, "ins=%d:%s"}, + {Opt_add, "ins:%d:%s"}, + {Opt_append, "append=%s"}, + {Opt_append, "append:%s"}, + {Opt_prepend, "prepend=%s"}, + {Opt_prepend, "prepend:%s"}, + + {Opt_del, "del=%s"}, + {Opt_del, "del:%s"}, + /* {Opt_idel, "idel:%d"}, */ + {Opt_mod, "mod=%s"}, + {Opt_mod, "mod:%s"}, + /* {Opt_imod, "imod:%d:%s"}, */ + + {Opt_dirwh, "dirwh=%d"}, + + {Opt_xino, "xino=%s"}, + {Opt_noxino, "noxino"}, + {Opt_trunc_xino, "trunc_xino"}, + {Opt_trunc_xino_v, "trunc_xino_v=%d:%d"}, + {Opt_notrunc_xino, "notrunc_xino"}, + {Opt_trunc_xino_path, "trunc_xino=%s"}, + {Opt_itrunc_xino, "itrunc_xino=%d"}, + /* {Opt_zxino, "zxino=%s"}, */ + {Opt_trunc_xib, "trunc_xib"}, + {Opt_notrunc_xib, "notrunc_xib"}, + + {Opt_plink, "plink"}, + {Opt_noplink, "noplink"}, +#ifdef CONFIG_AUFS_DEBUG + {Opt_list_plink, "list_plink"}, +#endif + + {Opt_udba, "udba=%s"}, + + {Opt_diropq_a, "diropq=always"}, + {Opt_diropq_a, "diropq=a"}, + {Opt_diropq_w, "diropq=whiteouted"}, + {Opt_diropq_w, "diropq=w"}, + + {Opt_warn_perm, "warn_perm"}, + {Opt_nowarn_perm, "nowarn_perm"}, + + /* keep them temporary */ + {Opt_ignore_silent, "coo=%s"}, + {Opt_ignore_silent, "nodlgt"}, + {Opt_ignore_silent, "nodirperm1"}, + {Opt_ignore_silent, "noshwh"}, + {Opt_ignore_silent, "noshwh"}, + {Opt_ignore_silent, "clean_plink"}, + + {Opt_rendir, "rendir=%d"}, + + {Opt_refrof, "refrof"}, + {Opt_norefrof, "norefrof"}, + + {Opt_verbose, "verbose"}, + {Opt_verbose, "v"}, + {Opt_noverbose, "noverbose"}, + {Opt_noverbose, "quiet"}, + {Opt_noverbose, "q"}, + {Opt_noverbose, "silent"}, + + {Opt_sum, "sum"}, + {Opt_nosum, "nosum"}, + {Opt_wsum, "wsum"}, + + {Opt_rdcache, "rdcache=%d"}, + {Opt_rdcache, "rdcache:%d"}, + + {Opt_wbr_create, "create=%s"}, + {Opt_wbr_create, "create_policy=%s"}, + {Opt_wbr_copyup, "cpup=%s"}, + {Opt_wbr_copyup, "copyup=%s"}, + {Opt_wbr_copyup, "copyup_policy=%s"}, + + /* internal use for the scripts */ + {Opt_ignore_silent, "si=%s"}, + + {Opt_br, "dirs=%s"}, + {Opt_ignore, "debug=%d"}, + {Opt_ignore, "delete=whiteout"}, + {Opt_ignore, "delete=all"}, + {Opt_ignore, "imap=%s"}, + + {Opt_err, NULL} +}; + +/* ---------------------------------------------------------------------- */ + +static const char *au_parser_pattern(int val, struct match_token *token) +{ + while (token->pattern) { + if (token->token == val) + return token->pattern; + token++; + } + BUG(); + return "??"; +} + +/* ---------------------------------------------------------------------- */ + +static match_table_t brperms = { + {AuBrPerm_RO, AUFS_BRPERM_RO}, + {AuBrPerm_RR, AUFS_BRPERM_RR}, + {AuBrPerm_RW, AUFS_BRPERM_RW}, + + {AuBrPerm_ROWH, AUFS_BRPERM_ROWH}, + {AuBrPerm_RRWH, AUFS_BRPERM_RRWH}, + {AuBrPerm_RWNoLinkWH, AUFS_BRPERM_RWNLWH}, + + {AuBrPerm_ROWH, "nfsro"}, + {AuBrPerm_RO, NULL} +}; + +static int br_perm_val(char *perm) +{ + int val; + substring_t args[MAX_OPT_ARGS]; + + val = match_token(perm, brperms, args); + return val; +} + +const char *au_optstr_br_perm(int brperm) +{ + return au_parser_pattern(brperm, (void *)brperms); +} + +/* ---------------------------------------------------------------------- */ + +static match_table_t udbalevel = { + {AuOpt_UDBA_REVAL, "reval"}, + {AuOpt_UDBA_NONE, "none"}, +#ifdef CONFIG_AUFS_HINOTIFY + {AuOpt_UDBA_HINOTIFY, "inotify"}, +#endif + {-1, NULL} +}; + +static int udba_val(char *str) +{ + substring_t args[MAX_OPT_ARGS]; + + return match_token(str, udbalevel, args); +} + +const char *au_optstr_udba(int udba) +{ + return au_parser_pattern(udba, (void *)udbalevel); +} + +/* ---------------------------------------------------------------------- */ + +static match_table_t au_wbr_create_policy = { + {AuWbrCreate_TDP, "tdp"}, + {AuWbrCreate_TDP, "top-down-parent"}, + {AuWbrCreate_RR, "rr"}, + {AuWbrCreate_RR, "round-robin"}, + {AuWbrCreate_MFS, "mfs"}, + {AuWbrCreate_MFS, "most-free-space"}, + {AuWbrCreate_MFSV, "mfs:%d"}, + {AuWbrCreate_MFSV, "most-free-space:%d"}, + + {AuWbrCreate_MFSRR, "mfsrr:%d"}, + {AuWbrCreate_MFSRRV, "mfsrr:%d:%d"}, + {AuWbrCreate_PMFS, "pmfs"}, + {AuWbrCreate_PMFSV, "pmfs:%d"}, + + {-1, NULL} +}; + +/* cf. linux/lib/parser.c */ +static int au_match_ull(substring_t *s, unsigned long long *result) +{ + int err; + unsigned int len; + char a[32]; + + err = -ERANGE; + len = s->to - s->from; + if (len + 1 <= sizeof(a)) { + memcpy(a, s->from, len); + a[len] = '\0'; + err = strict_strtoull(a, 0, result); + } + return err; +} + +static int au_wbr_mfs_wmark(substring_t *arg, char *str, + struct au_opt_wbr_create *create) +{ + int err; + unsigned long long ull; + + err = 0; + if (!au_match_ull(arg, &ull)) + create->mfsrr_watermark = ull; + else { + AuErr("bad integer in %s\n", str); + err = -EINVAL; + } + + return err; +} + +static int au_wbr_mfs_sec(substring_t *arg, char *str, + struct au_opt_wbr_create *create) +{ + int n, err; + + err = 0; + if (!match_int(arg, &n) && 0 <= n) + create->mfs_second = n; + else { + AuErr("bad integer in %s\n", str); + err = -EINVAL; + } + + return err; +} + +static int au_wbr_create_val(char *str, struct au_opt_wbr_create *create) +{ + int err, e; + substring_t args[MAX_OPT_ARGS]; + + err = match_token(str, au_wbr_create_policy, args); + create->wbr_create = err; + switch (err) { + case AuWbrCreate_MFSRRV: + e = au_wbr_mfs_wmark(&args[0], str, create); + if (!e) + e = au_wbr_mfs_sec(&args[1], str, create); + if (unlikely(e)) + err = e; + break; + case AuWbrCreate_MFSRR: + e = au_wbr_mfs_wmark(&args[0], str, create); + if (unlikely(e)) { + err = e; + break; + } + /*FALLTHROUGH*/ + case AuWbrCreate_MFS: + case AuWbrCreate_PMFS: + create->mfs_second = AUFS_MFS_SECOND_DEF; + break; + case AuWbrCreate_MFSV: + case AuWbrCreate_PMFSV: + e = au_wbr_mfs_sec(&args[0], str, create); + if (unlikely(e)) + err = e; + break; + } + + return err; +} + +const char *au_optstr_wbr_create(int wbr_create) +{ + return au_parser_pattern(wbr_create, (void *)au_wbr_create_policy); +} + +static match_table_t au_wbr_copyup_policy = { + {AuWbrCopyup_TDP, "tdp"}, + {AuWbrCopyup_TDP, "top-down-parent"}, + {AuWbrCopyup_BUP, "bup"}, + {AuWbrCopyup_BUP, "bottom-up-parent"}, + {AuWbrCopyup_BU, "bu"}, + {AuWbrCopyup_BU, "bottom-up"}, + {-1, NULL} +}; + +static int au_wbr_copyup_val(char *str) +{ + substring_t args[MAX_OPT_ARGS]; + + return match_token(str, au_wbr_copyup_policy, args); +} + +const char *au_optstr_wbr_copyup(int wbr_copyup) +{ + return au_parser_pattern(wbr_copyup, (void *)au_wbr_copyup_policy); +} + +/* ---------------------------------------------------------------------- */ + +static const int lkup_dirflags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY; + +static void dump_opts(struct au_opts *opts) +{ +#ifdef CONFIG_AUFS_DEBUG + /* reduce stack space */ + union { + struct au_opt_add *add; + struct au_opt_del *del; + struct au_opt_mod *mod; + struct au_opt_xino *xino; + struct au_opt_xino_itrunc *xino_itrunc; + struct au_opt_wbr_create *create; + } u; + struct au_opt *opt; + + opt = opts->opt; + while (opt->type != Opt_tail) { + switch (opt->type) { + case Opt_add: + u.add = &opt->add; + AuDbg("add {b%d, %s, 0x%x, %p}\n", + u.add->bindex, u.add->pathname, u.add->perm, + u.add->path.dentry); + break; + case Opt_del: + case Opt_idel: + u.del = &opt->del; + AuDbg("del {%s, %p}\n", + u.del->pathname, u.del->h_path.dentry); + break; + case Opt_mod: + case Opt_imod: + u.mod = &opt->mod; + AuDbg("mod {%s, 0x%x, %p}\n", + u.mod->path, u.mod->perm, u.mod->h_root); + break; + case Opt_append: + u.add = &opt->add; + AuDbg("append {b%d, %s, 0x%x, %p}\n", + u.add->bindex, u.add->pathname, u.add->perm, + u.add->path.dentry); + break; + case Opt_prepend: + u.add = &opt->add; + AuDbg("prepend {b%d, %s, 0x%x, %p}\n", + u.add->bindex, u.add->pathname, u.add->perm, + u.add->path.dentry); + break; + case Opt_dirwh: + AuDbg("dirwh %d\n", opt->dirwh); + break; + case Opt_rdcache: + AuDbg("rdcache %d\n", opt->rdcache); + break; + case Opt_xino: + u.xino = &opt->xino; + AuDbg("xino {%s %.*s}\n", + u.xino->path, + AuDLNPair(u.xino->file->f_dentry)); + break; + case Opt_trunc_xino: + AuLabel(trunc_xino); + break; + case Opt_notrunc_xino: + AuLabel(notrunc_xino); + break; + case Opt_trunc_xino_path: + case Opt_itrunc_xino: + u.xino_itrunc = &opt->xino_itrunc; + AuDbg("trunc_xino %d\n", u.xino_itrunc->bindex); + break; + + case Opt_noxino: + AuLabel(noxino); + break; + case Opt_trunc_xib: + AuLabel(trunc_xib); + break; + case Opt_notrunc_xib: + AuLabel(notrunc_xib); + break; + case Opt_plink: + AuLabel(plink); + break; + case Opt_noplink: + AuLabel(noplink); + break; + case Opt_list_plink: + AuLabel(list_plink); + break; + case Opt_udba: + AuDbg("udba %d, %s\n", + opt->udba, au_optstr_udba(opt->udba)); + break; + case Opt_diropq_a: + AuLabel(diropq_a); + break; + case Opt_diropq_w: + AuLabel(diropq_w); + break; + case Opt_warn_perm: + AuLabel(warn_perm); + break; + case Opt_nowarn_perm: + AuLabel(nowarn_perm); + break; + case Opt_refrof: + AuLabel(refrof); + break; + case Opt_norefrof: + AuLabel(norefrof); + break; + case Opt_verbose: + AuLabel(verbose); + break; + case Opt_noverbose: + AuLabel(noverbose); + break; + case Opt_sum: + AuLabel(sum); + break; + case Opt_nosum: + AuLabel(nosum); + break; + case Opt_wsum: + AuLabel(wsum); + break; + case Opt_wbr_create: + u.create = &opt->wbr_create; + AuDbg("create %d, %s\n", u.create->wbr_create, + au_optstr_wbr_create(u.create->wbr_create)); + switch (u.create->wbr_create) { + case AuWbrCreate_MFSV: + case AuWbrCreate_PMFSV: + AuDbg("%d sec\n", u.create->mfs_second); + break; + case AuWbrCreate_MFSRR: + AuDbg("%llu watermark\n", + u.create->mfsrr_watermark); + break; + case AuWbrCreate_MFSRRV: + AuDbg("%llu watermark, %d sec\n", + u.create->mfsrr_watermark, + u.create->mfs_second); + break; + } + break; + case Opt_wbr_copyup: + AuDbg("copyup %d, %s\n", opt->wbr_copyup, + au_optstr_wbr_copyup(opt->wbr_copyup)); + break; + default: + BUG(); + } + opt++; + } +#endif +} + +void au_opts_free(struct au_opts *opts) +{ + struct au_opt *opt; + + opt = opts->opt; + while (opt->type != Opt_tail) { + switch (opt->type) { + case Opt_add: + case Opt_append: + case Opt_prepend: + path_put(&opt->add.path); + break; + case Opt_del: + case Opt_idel: + path_put(&opt->del.h_path); + break; + case Opt_mod: + case Opt_imod: + dput(opt->mod.h_root); + break; + case Opt_xino: + fput(opt->xino.file); + break; + } + opt++; + } +} + +static int opt_add(struct au_opt *opt, char *opt_str, unsigned long sb_flags, + aufs_bindex_t bindex) +{ + int err; + struct au_opt_add *add = &opt->add; + char *p; + + add->bindex = bindex; + add->perm = AuBrPerm_Last; + add->pathname = opt_str; + p = strchr(opt_str, '='); + if (p) { + *p++ = 0; + if (*p) + add->perm = br_perm_val(p); + } + + err = vfsub_kern_path(add->pathname, lkup_dirflags, &add->path); + if (!err) { + if (!p) { + add->perm = AuBrPerm_RO; + if (au_test_fs_rr(add->path.dentry->d_sb)) + add->perm = AuBrPerm_RR; + else if (!bindex && !(sb_flags & MS_RDONLY)) + add->perm = AuBrPerm_RW; + } + opt->type = Opt_add; + goto out; + } + AuErr("lookup failed %s (%d)\n", add->pathname, err); + err = -EINVAL; + + out: + return err; +} + +static int au_opts_parse_del(struct au_opt_del *del, substring_t args[]) +{ + int err; + + del->pathname = args[0].from; + AuDbg("del path %s\n", del->pathname); + + err = vfsub_kern_path(del->pathname, lkup_dirflags, &del->h_path); + if (unlikely(err)) + AuErr("lookup failed %s (%d)\n", del->pathname, err); + + return err; +} + +#if 0 /* reserved for future use */ +static int au_opts_parse_idel(struct super_block *sb, aufs_bindex_t bindex, + struct au_opt_del *del, substring_t args[]) +{ + int err; + struct dentry *root; + + err = -EINVAL; + root = sb->s_root; + aufs_read_lock(root, AuLock_FLUSH); + if (bindex < 0 || au_sbend(sb) < bindex) { + AuErr("out of bounds, %d\n", bindex); + goto out; + } + + err = 0; + del->h_path.dentry = dget(au_h_dptr(root, bindex)); + del->h_path.mnt = mntget(au_sbr_mnt(sb, bindex)); + + out: + aufs_read_unlock(root, !AuLock_IR); + return err; +} +#endif + +static int au_opts_parse_mod(struct au_opt_mod *mod, substring_t args[]) +{ + int err; + struct path path; + char *p; + + err = -EINVAL; + mod->path = args[0].from; + p = strchr(mod->path, '='); + if (unlikely(!p)) { + AuErr("no permssion %s\n", args[0].from); + goto out; + } + + *p++ = 0; + err = vfsub_kern_path(mod->path, lkup_dirflags, &path); + if (unlikely(err)) { + AuErr("lookup failed %s (%d)\n", mod->path, err); + goto out; + } + + mod->perm = br_perm_val(p); + AuDbg("mod path %s, perm 0x%x, %s\n", mod->path, mod->perm, p); + mod->h_root = dget(path.dentry); + path_put(&path); + + out: + return err; +} + +#if 0 /* reserved for future use */ +static int au_opts_parse_imod(struct super_block *sb, aufs_bindex_t bindex, + struct au_opt_mod *mod, substring_t args[]) +{ + int err; + struct dentry *root; + + err = -EINVAL; + root = sb->s_root; + aufs_read_lock(root, AuLock_FLUSH); + if (bindex < 0 || au_sbend(sb) < bindex) { + AuErr("out of bounds, %d\n", bindex); + goto out; + } + + err = 0; + mod->perm = br_perm_val(args[1].from); + AuDbg("mod path %s, perm 0x%x, %s\n", + mod->path, mod->perm, args[1].from); + mod->h_root = dget(au_h_dptr(root, bindex)); + + out: + aufs_read_unlock(root, !AuLock_IR); + return err; +} +#endif + +static int au_opts_parse_xino(struct super_block *sb, struct au_opt_xino *xino, + substring_t args[]) +{ + int err; + struct file *file; + + file = au_xino_create(sb, args[0].from, /*silent*/0); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + + err = -EINVAL; + if (unlikely(file->f_dentry->d_sb == sb)) { + fput(file); + AuErr("%s must be outside\n", args[0].from); + goto out; + } + + err = 0; + xino->file = file; + xino->path = args[0].from; + + out: + return err; +} + +static +int au_opts_parse_xino_itrunc_path(struct super_block *sb, + struct au_opt_xino_itrunc *xino_itrunc, + substring_t args[]) +{ + int err; + aufs_bindex_t bend, bindex; + struct path path; + struct dentry *root; + + err = vfsub_kern_path(args[0].from, lkup_dirflags, &path); + if (unlikely(err)) { + AuErr("lookup failed %s (%d)\n", args[0].from, err); + goto out; + } + + xino_itrunc->bindex = -1; + root = sb->s_root; + aufs_read_lock(root, AuLock_FLUSH); + bend = au_sbend(sb); + for (bindex = 0; bindex <= bend; bindex++) { + if (au_h_dptr(root, bindex) == path.dentry) { + xino_itrunc->bindex = bindex; + break; + } + } + aufs_read_unlock(root, !AuLock_IR); + path_put(&path); + + if (unlikely(xino_itrunc->bindex < 0)) { + AuErr("no such branch %s\n", args[0].from); + err = -EINVAL; + } + + out: + return err; +} + +/* called without aufs lock */ +int au_opts_parse(struct super_block *sb, char *str, struct au_opts *opts) +{ + int err, n, token; + aufs_bindex_t bindex; + unsigned char skipped; + struct dentry *root; + struct au_opt *opt, *opt_tail; + char *opt_str; + /* reduce the stack space */ + union { + struct au_opt_xino_itrunc *xino_itrunc; + struct au_opt_wbr_create *create; + } u; + struct { + substring_t args[MAX_OPT_ARGS]; + } *a; + + err = -ENOMEM; + a = kmalloc(sizeof(*a), GFP_NOFS); + if (unlikely(!a)) + goto out; + + root = sb->s_root; + err = 0; + bindex = 0; + opt = opts->opt; + opt_tail = opt + opts->max_opt - 1; + opt->type = Opt_tail; + while (!err && (opt_str = strsep(&str, ",")) && *opt_str) { + err = -EINVAL; + skipped = 0; + token = match_token(opt_str, options, a->args); + switch (token) { + case Opt_br: + err = 0; + while (!err && (opt_str = strsep(&a->args[0].from, ":")) + && *opt_str) { + err = opt_add(opt, opt_str, opts->sb_flags, + bindex++); + if (unlikely(!err && ++opt > opt_tail)) { + err = -E2BIG; + break; + } + opt->type = Opt_tail; + skipped = 1; + } + break; + case Opt_add: + if (unlikely(match_int(&a->args[0], &n))) { + AuErr("bad integer in %s\n", opt_str); + break; + } + bindex = n; + err = opt_add(opt, a->args[1].from, opts->sb_flags, + bindex); + if (!err) + opt->type = token; + break; + case Opt_append: + err = opt_add(opt, a->args[0].from, opts->sb_flags, + /*dummy bindex*/1); + if (!err) + opt->type = token; + break; + case Opt_prepend: + err = opt_add(opt, a->args[0].from, opts->sb_flags, + /*bindex*/0); + if (!err) + opt->type = token; + break; + case Opt_del: + err = au_opts_parse_del(&opt->del, a->args); + if (!err) + opt->type = token; + break; +#if 0 /* reserved for future use */ + case Opt_idel: + del->pathname = "(indexed)"; + if (unlikely(match_int(&args[0], &n))) { + AuErr("bad integer in %s\n", opt_str); + break; + } + err = au_opts_parse_idel(sb, n, &opt->del, a->args); + if (!err) + opt->type = token; + break; +#endif + case Opt_mod: + err = au_opts_parse_mod(&opt->mod, a->args); + if (!err) + opt->type = token; + break; +#ifdef IMOD /* reserved for future use */ + case Opt_imod: + u.mod->path = "(indexed)"; + if (unlikely(match_int(&a->args[0], &n))) { + AuErr("bad integer in %s\n", opt_str); + break; + } + err = au_opts_parse_imod(sb, n, &opt->mod, a->args); + if (!err) + opt->type = token; + break; +#endif + case Opt_xino: + err = au_opts_parse_xino(sb, &opt->xino, a->args); + if (!err) + opt->type = token; + break; + + case Opt_trunc_xino_path: + err = au_opts_parse_xino_itrunc_path + (sb, &opt->xino_itrunc, a->args); + if (!err) + opt->type = token; + break; + + case Opt_itrunc_xino: + u.xino_itrunc = &opt->xino_itrunc; + if (unlikely(match_int(&a->args[0], &n))) { + AuErr("bad integer in %s\n", opt_str); + break; + } + u.xino_itrunc->bindex = n; + aufs_read_lock(root, AuLock_FLUSH); + if (n < 0 || au_sbend(sb) < n) { + AuErr("out of bounds, %d\n", n); + aufs_read_unlock(root, !AuLock_IR); + break; + } + aufs_read_unlock(root, !AuLock_IR); + err = 0; + opt->type = token; + break; + + case Opt_dirwh: + if (unlikely(match_int(&a->args[0], &opt->dirwh))) + break; + err = 0; + opt->type = token; + break; + + case Opt_rdcache: + if (unlikely(match_int(&a->args[0], &opt->rdcache))) + break; + err = 0; + opt->type = token; + break; + + case Opt_trunc_xino: + case Opt_notrunc_xino: + case Opt_noxino: + case Opt_trunc_xib: + case Opt_notrunc_xib: + case Opt_plink: + case Opt_noplink: + case Opt_list_plink: + case Opt_diropq_a: + case Opt_diropq_w: + case Opt_warn_perm: + case Opt_nowarn_perm: + case Opt_refrof: + case Opt_norefrof: + case Opt_verbose: + case Opt_noverbose: + case Opt_sum: + case Opt_nosum: + case Opt_wsum: + err = 0; + opt->type = token; + break; + + case Opt_udba: + opt->udba = udba_val(a->args[0].from); + if (opt->udba >= 0) { + err = 0; + opt->type = token; + } else + AuErr("wrong value, %s\n", opt_str); + break; + + case Opt_wbr_create: + u.create = &opt->wbr_create; + u.create->wbr_create + = au_wbr_create_val(a->args[0].from, u.create); + if (u.create->wbr_create >= 0) { + err = 0; + opt->type = token; + } else + AuErr("wrong value, %s\n", opt_str); + break; + case Opt_wbr_copyup: + opt->wbr_copyup = au_wbr_copyup_val(a->args[0].from); + if (opt->wbr_copyup >= 0) { + err = 0; + opt->type = token; + } else + AuErr("wrong value, %s\n", opt_str); + break; + + case Opt_ignore: + AuWarn("ignored %s\n", opt_str); + /*FALLTHROUGH*/ + case Opt_ignore_silent: + skipped = 1; + err = 0; + break; + case Opt_err: + AuErr("unknown option %s\n", opt_str); + break; + } + + if (!err && !skipped) { + if (unlikely(++opt > opt_tail)) { + err = -E2BIG; + opt--; + opt->type = Opt_tail; + break; + } + opt->type = Opt_tail; + } + } + + kfree(a); + dump_opts(opts); + if (unlikely(err)) + au_opts_free(opts); + + out: + return err; +} + +static int au_opt_wbr_create(struct super_block *sb, + struct au_opt_wbr_create *create) +{ + int err; + struct au_sbinfo *sbinfo; + + err = 1; /* handled */ + sbinfo = au_sbi(sb); + if (sbinfo->si_wbr_create_ops->fin) { + err = sbinfo->si_wbr_create_ops->fin(sb); + if (!err) + err = 1; + } + + sbinfo->si_wbr_create = create->wbr_create; + sbinfo->si_wbr_create_ops = au_wbr_create_ops + create->wbr_create; + switch (create->wbr_create) { + case AuWbrCreate_MFSRRV: + case AuWbrCreate_MFSRR: + sbinfo->si_wbr_mfs.mfsrr_watermark = create->mfsrr_watermark; + /*FALLTHROUGH*/ + case AuWbrCreate_MFS: + case AuWbrCreate_MFSV: + case AuWbrCreate_PMFS: + case AuWbrCreate_PMFSV: + sbinfo->si_wbr_mfs.mfs_expire = create->mfs_second * HZ; + break; + } + + if (sbinfo->si_wbr_create_ops->init) + sbinfo->si_wbr_create_ops->init(sb); /* ignore */ + + return err; +} + +/* + * returns, + * plus: processed without an error + * zero: unprocessed + */ +static int au_opt_simple(struct super_block *sb, struct au_opt *opt, + struct au_opts *opts) +{ + int err; + struct au_sbinfo *sbinfo; + + err = 1; /* handled */ + sbinfo = au_sbi(sb); + switch (opt->type) { + case Opt_udba: + sbinfo->si_mntflags &= ~AuOptMask_UDBA; + sbinfo->si_mntflags |= opt->udba; + opts->given_udba |= opt->udba; + break; + + case Opt_plink: + au_opt_set(sbinfo->si_mntflags, PLINK); + break; + case Opt_noplink: + if (au_opt_test(sbinfo->si_mntflags, PLINK)) + au_plink_put(sb); + au_opt_clr(sbinfo->si_mntflags, PLINK); + break; + case Opt_list_plink: + if (au_opt_test(sbinfo->si_mntflags, PLINK)) + au_plink_list(sb); + break; + + case Opt_diropq_a: + au_opt_set(sbinfo->si_mntflags, ALWAYS_DIROPQ); + break; + case Opt_diropq_w: + au_opt_clr(sbinfo->si_mntflags, ALWAYS_DIROPQ); + break; + + case Opt_warn_perm: + au_opt_set(sbinfo->si_mntflags, WARN_PERM); + break; + case Opt_nowarn_perm: + au_opt_clr(sbinfo->si_mntflags, WARN_PERM); + break; + + case Opt_refrof: + au_opt_set(sbinfo->si_mntflags, REFROF); + break; + case Opt_norefrof: + au_opt_clr(sbinfo->si_mntflags, REFROF); + break; + + case Opt_verbose: + au_opt_set(sbinfo->si_mntflags, VERBOSE); + break; + case Opt_noverbose: + au_opt_clr(sbinfo->si_mntflags, VERBOSE); + break; + + case Opt_sum: + au_opt_set(sbinfo->si_mntflags, SUM); + break; + case Opt_wsum: + au_opt_clr(sbinfo->si_mntflags, SUM); + au_opt_set(sbinfo->si_mntflags, SUM_W); + case Opt_nosum: + au_opt_clr(sbinfo->si_mntflags, SUM); + au_opt_clr(sbinfo->si_mntflags, SUM_W); + break; + + case Opt_wbr_create: + err = au_opt_wbr_create(sb, &opt->wbr_create); + break; + case Opt_wbr_copyup: + sbinfo->si_wbr_copyup = opt->wbr_copyup; + sbinfo->si_wbr_copyup_ops = au_wbr_copyup_ops + opt->wbr_copyup; + break; + + case Opt_dirwh: + sbinfo->si_dirwh = opt->dirwh; + break; + + case Opt_rdcache: + sbinfo->si_rdcache = opt->rdcache * HZ; + break; + + case Opt_trunc_xino: + au_opt_set(sbinfo->si_mntflags, TRUNC_XINO); + break; + case Opt_notrunc_xino: + au_opt_clr(sbinfo->si_mntflags, TRUNC_XINO); + break; + + case Opt_trunc_xino_path: + case Opt_itrunc_xino: + err = au_xino_trunc(sb, opt->xino_itrunc.bindex); + if (!err) + err = 1; + break; + + case Opt_trunc_xib: + au_fset_opts(opts->flags, TRUNC_XIB); + break; + case Opt_notrunc_xib: + au_fclr_opts(opts->flags, TRUNC_XIB); + break; + + default: + err = 0; + break; + } + + return err; +} + +/* + * returns tri-state. + * plus: processed without an error + * zero: unprocessed + * minus: error + */ +static int au_opt_br(struct super_block *sb, struct au_opt *opt, + struct au_opts *opts) +{ + int err, do_refresh; + + err = 0; + switch (opt->type) { + case Opt_append: + opt->add.bindex = au_sbend(sb) + 1; + if (opt->add.bindex < 0) + opt->add.bindex = 0; + goto add; + case Opt_prepend: + opt->add.bindex = 0; + add: + case Opt_add: + err = au_br_add(sb, &opt->add, + au_ftest_opts(opts->flags, REMOUNT)); + if (!err) { + err = 1; + au_fset_opts(opts->flags, REFRESH_DIR); + if (au_br_whable(opt->add.perm)) + au_fset_opts(opts->flags, REFRESH_NONDIR); + } + break; + + case Opt_del: + case Opt_idel: + err = au_br_del(sb, &opt->del, + au_ftest_opts(opts->flags, REMOUNT)); + if (!err) { + err = 1; + au_fset_opts(opts->flags, TRUNC_XIB); + au_fset_opts(opts->flags, REFRESH_DIR); + au_fset_opts(opts->flags, REFRESH_NONDIR); + } + break; + + case Opt_mod: + case Opt_imod: + err = au_br_mod(sb, &opt->mod, + au_ftest_opts(opts->flags, REMOUNT), + &do_refresh); + if (!err) { + err = 1; + if (do_refresh) { + au_fset_opts(opts->flags, REFRESH_DIR); + au_fset_opts(opts->flags, REFRESH_NONDIR); + } + } + break; + } + + return err; +} + +static int au_opt_xino(struct super_block *sb, struct au_opt *opt, + struct au_opt_xino **opt_xino, + struct au_opts *opts) +{ + int err; + aufs_bindex_t bend, bindex; + struct dentry *root, *parent, *h_root; + + err = 0; + switch (opt->type) { + case Opt_xino: + err = au_xino_set(sb, &opt->xino, + !!au_ftest_opts(opts->flags, REMOUNT)); + if (unlikely(err)) + break; + + *opt_xino = &opt->xino; + au_xino_brid_set(sb, -1); + + /* safe d_parent access */ + parent = opt->xino.file->f_dentry->d_parent; + root = sb->s_root; + bend = au_sbend(sb); + for (bindex = 0; bindex <= bend; bindex++) { + h_root = au_h_dptr(root, bindex); + if (h_root == parent) { + au_xino_brid_set(sb, au_sbr_id(sb, bindex)); + break; + } + } + break; + + case Opt_noxino: + au_xino_clr(sb); + au_xino_brid_set(sb, -1); + *opt_xino = (void *)-1; + break; + } + + return err; +} + +int au_opts_verify(struct super_block *sb, unsigned long sb_flags, + unsigned int pending) +{ + int err; + aufs_bindex_t bindex, bend; + unsigned char do_plink, skip, do_free; + struct au_branch *br; + struct au_wbr *wbr; + struct dentry *root; + struct inode *dir, *h_dir; + struct au_sbinfo *sbinfo; + struct au_hinode *hdir; + + sbinfo = au_sbi(sb); + AuDebugOn(!(sbinfo->si_mntflags & AuOptMask_UDBA)); + + if (unlikely(!(sb_flags & MS_RDONLY) + && !au_br_writable(au_sbr_perm(sb, 0)))) + AuWarn("first branch should be rw\n"); + + if (au_opt_test((sbinfo->si_mntflags | pending), UDBA_HINOTIFY) + && !au_opt_test(sbinfo->si_mntflags, XINO)) + AuWarn("udba=inotify requires xino\n"); + + err = 0; + root = sb->s_root; + dir = sb->s_root->d_inode; + do_plink = !!au_opt_test(sbinfo->si_mntflags, PLINK); + bend = au_sbend(sb); + for (bindex = 0; !err && bindex <= bend; bindex++) { + skip = 0; + h_dir = au_h_iptr(dir, bindex); + br = au_sbr(sb, bindex); + do_free = 0; + + wbr = br->br_wbr; + if (wbr) + wbr_wh_read_lock(wbr); + + switch (br->br_perm) { + case AuBrPerm_RO: + case AuBrPerm_ROWH: + case AuBrPerm_RR: + case AuBrPerm_RRWH: + do_free = !!wbr; + skip = (!wbr + || (!wbr->wbr_whbase + && !wbr->wbr_plink + && !wbr->wbr_orph)); + break; + + case AuBrPerm_RWNoLinkWH: + /* skip = (!br->br_whbase && !br->br_orph); */ + skip = (!wbr || !wbr->wbr_whbase); + if (skip && wbr) { + if (do_plink) + skip = !!wbr->wbr_plink; + else + skip = !wbr->wbr_plink; + } + break; + + case AuBrPerm_RW: + /* skip = (br->br_whbase && br->br_ohph); */ + skip = (wbr && wbr->wbr_whbase); + if (skip) { + if (do_plink) + skip = !!wbr->wbr_plink; + else + skip = !wbr->wbr_plink; + } + break; + + default: + BUG(); + } + if (wbr) + wbr_wh_read_unlock(wbr); + + if (skip) + continue; + + hdir = au_hi(dir, bindex); + au_hin_imtx_lock_nested(hdir, AuLsc_I_PARENT); + if (wbr) + wbr_wh_write_lock(wbr); + err = au_wh_init(au_h_dptr(root, bindex), br, sb); + if (wbr) + wbr_wh_write_unlock(wbr); + au_hin_imtx_unlock(hdir); + + if (!err && do_free) { + kfree(wbr); + br->br_wbr = NULL; + } + } + + return err; +} + +int au_opts_mount(struct super_block *sb, struct au_opts *opts) +{ + int err; + unsigned int tmp; + aufs_bindex_t bend; + struct au_opt *opt; + struct au_opt_xino *opt_xino, xino; + struct au_sbinfo *sbinfo; + + err = 0; + opt_xino = NULL; + opt = opts->opt; + while (err >= 0 && opt->type != Opt_tail) + err = au_opt_simple(sb, opt++, opts); + if (err > 0) + err = 0; + else if (unlikely(err < 0)) + goto out; + + /* disable xino and udba temporary */ + sbinfo = au_sbi(sb); + tmp = sbinfo->si_mntflags; + au_opt_clr(sbinfo->si_mntflags, XINO); + au_opt_set_udba(sbinfo->si_mntflags, UDBA_REVAL); + + opt = opts->opt; + while (err >= 0 && opt->type != Opt_tail) + err = au_opt_br(sb, opt++, opts); + if (err > 0) + err = 0; + else if (unlikely(err < 0)) + goto out; + + bend = au_sbend(sb); + if (unlikely(bend < 0)) { + err = -EINVAL; + AuErr("no branches\n"); + goto out; + } + + if (au_opt_test(tmp, XINO)) + au_opt_set(sbinfo->si_mntflags, XINO); + opt = opts->opt; + while (!err && opt->type != Opt_tail) + err = au_opt_xino(sb, opt++, &opt_xino, opts); + if (unlikely(err)) + goto out; + + err = au_opts_verify(sb, sb->s_flags, tmp); + if (unlikely(err)) + goto out; + + /* restore xino */ + if (au_opt_test(tmp, XINO) && !opt_xino) { + xino.file = au_xino_def(sb); + err = PTR_ERR(xino.file); + if (IS_ERR(xino.file)) + goto out; + + err = au_xino_set(sb, &xino, /*remount*/0); + fput(xino.file); + if (unlikely(err)) + goto out; + } + + /* restore udba */ + sbinfo->si_mntflags &= ~AuOptMask_UDBA; + sbinfo->si_mntflags |= (tmp & AuOptMask_UDBA); + if (au_opt_test(tmp, UDBA_HINOTIFY)) { + struct inode *dir = sb->s_root->d_inode; + au_reset_hinotify(dir, + au_hi_flags(dir, /*isdir*/1) & ~AuHi_XINO); + } + + out: + return err; +} + +int au_opts_remount(struct super_block *sb, struct au_opts *opts) +{ + int err, rerr; + struct inode *dir; + struct au_opt_xino *opt_xino; + struct au_opt *opt; + struct au_sbinfo *sbinfo; + + dir = sb->s_root->d_inode; + sbinfo = au_sbi(sb); + err = 0; + opt_xino = NULL; + opt = opts->opt; + while (err >= 0 && opt->type != Opt_tail) { + err = au_opt_simple(sb, opt, opts); + if (!err) + err = au_opt_br(sb, opt, opts); + if (!err) + err = au_opt_xino(sb, opt, &opt_xino, opts); + opt++; + } + if (err > 0) + err = 0; + AuTraceErr(err); + /* go on even err */ + + rerr = au_opts_verify(sb, opts->sb_flags, /*pending*/0); + if (unlikely(rerr && !err)) + err = rerr; + + if (au_ftest_opts(opts->flags, TRUNC_XIB)) { + rerr = au_xib_trunc(sb); + if (unlikely(rerr && !err)) + err = rerr; + } + + /* will be handled by the caller */ + if (!au_ftest_opts(opts->flags, REFRESH_DIR) + && (opts->given_udba || au_opt_test(sbinfo->si_mntflags, XINO))) + au_fset_opts(opts->flags, REFRESH_DIR); + + AuDbg("status 0x%x\n", opts->flags); + return err; +} + +/* ---------------------------------------------------------------------- */ + +unsigned int au_opt_udba(struct super_block *sb) +{ + return au_mntflags(sb) & AuOptMask_UDBA; +} diff -urN ./fs/aufs/opts.h ./fs/aufs/opts.h --- ./fs/aufs/opts.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/opts.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,180 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * mount options/flags + */ + +#ifndef __AUFS_OPTS_H__ +#define __AUFS_OPTS_H__ + +#ifdef __KERNEL__ + +#include +#include +#include + +/* ---------------------------------------------------------------------- */ + +/* mount flags */ +#define AuOpt_XINO 1 /* external inode number bitmap + and translation table */ +#define AuOpt_TRUNC_XINO (1 << 1) /* truncate xino files */ +#define AuOpt_UDBA_NONE (1 << 2) /* users direct branch access */ +#define AuOpt_UDBA_REVAL (1 << 3) +#define AuOpt_UDBA_HINOTIFY (1 << 4) +#define AuOpt_PLINK (1 << 5) /* pseudo-link */ +#define AuOpt_DIRPERM1 (1 << 6) /* unimplemented */ +#define AuOpt_REFROF (1 << 7) /* unimplemented */ +#define AuOpt_ALWAYS_DIROPQ (1 << 8) /* policy to creating diropq */ +#define AuOpt_SUM (1 << 9) /* summation for statfs(2) */ +#define AuOpt_SUM_W (1 << 10) /* unimplemented */ +#define AuOpt_WARN_PERM (1 << 11) /* warn when add-branch */ +#define AuOpt_VERBOSE (1 << 12) /* busy inode when del-branch */ + +#ifndef CONFIG_AUFS_HINOTIFY +#undef AuOpt_UDBA_HINOTIFY +#define AuOpt_UDBA_HINOTIFY 0 +#endif + +#define AuOpt_Def (AuOpt_XINO \ + | AuOpt_UDBA_REVAL \ + | AuOpt_PLINK \ + /* | AuOpt_DIRPERM1 */ \ + | AuOpt_WARN_PERM) +#define AuOptMask_UDBA (AuOpt_UDBA_NONE \ + | AuOpt_UDBA_REVAL \ + | AuOpt_UDBA_HINOTIFY) + +#define au_opt_test(flags, name) (flags & AuOpt_##name) +#define au_opt_set(flags, name) do { \ + BUILD_BUG_ON(AuOpt_##name & AuOptMask_UDBA); \ + ((flags) |= AuOpt_##name); \ +} while (0) +#define au_opt_set_udba(flags, name) do { \ + (flags) &= ~AuOptMask_UDBA; \ + ((flags) |= AuOpt_##name); \ +} while (0) +#define au_opt_clr(flags, name) { ((flags) &= ~AuOpt_##name); } + +/* ---------------------------------------------------------------------- */ + +/* policies to select one among multiple writable branches */ +enum { + AuWbrCreate_TDP, /* top down parent */ + AuWbrCreate_RR, /* round robin */ + AuWbrCreate_MFS, /* most free space */ + AuWbrCreate_MFSV, /* mfs with seconds */ + AuWbrCreate_MFSRR, /* mfs then rr */ + AuWbrCreate_MFSRRV, /* mfs then rr with seconds */ + AuWbrCreate_PMFS, /* parent and mfs */ + AuWbrCreate_PMFSV, /* parent and mfs with seconds */ + + AuWbrCreate_Def = AuWbrCreate_TDP +}; + +enum { + AuWbrCopyup_TDP, /* top down parent */ + AuWbrCopyup_BUP, /* bottom up parent */ + AuWbrCopyup_BU, /* bottom up */ + + AuWbrCopyup_Def = AuWbrCopyup_TDP +}; + +/* ---------------------------------------------------------------------- */ + +struct au_opt_add { + aufs_bindex_t bindex; + char *pathname; + int perm; + struct path path; +}; + +struct au_opt_del { + char *pathname; + struct path h_path; +}; + +struct au_opt_mod { + char *path; + int perm; + struct dentry *h_root; +}; + +struct au_opt_xino { + char *path; + struct file *file; +}; + +struct au_opt_xino_itrunc { + aufs_bindex_t bindex; +}; + +struct au_opt_wbr_create { + int wbr_create; + int mfs_second; + unsigned long long mfsrr_watermark; +}; + +struct au_opt { + int type; + union { + struct au_opt_xino xino; + struct au_opt_xino_itrunc xino_itrunc; + struct au_opt_add add; + struct au_opt_del del; + struct au_opt_mod mod; + int dirwh; + int rdcache; + int deblk; + int nhash; + int udba; + struct au_opt_wbr_create wbr_create; + int wbr_copyup; + }; +}; + +/* opts flags */ +#define AuOpts_REMOUNT 1 +#define AuOpts_REFRESH_DIR (1 << 1) +#define AuOpts_REFRESH_NONDIR (1 << 2) +#define AuOpts_TRUNC_XIB (1 << 3) +#define au_ftest_opts(flags, name) ((flags) & AuOpts_##name) +#define au_fset_opts(flags, name) { (flags) |= AuOpts_##name; } +#define au_fclr_opts(flags, name) { (flags) &= ~AuOpts_##name; } + +struct au_opts { + struct au_opt *opt; + int max_opt; + + unsigned int given_udba; + unsigned int flags; + unsigned long sb_flags; +}; + +/* ---------------------------------------------------------------------- */ + +const char *au_optstr_br_perm(int brperm); +const char *au_optstr_udba(int udba); +const char *au_optstr_wbr_copyup(int wbr_copyup); +const char *au_optstr_wbr_create(int wbr_create); + +void au_opts_free(struct au_opts *opts); +int au_opts_parse(struct super_block *sb, char *str, struct au_opts *opts); +int au_opts_verify(struct super_block *sb, unsigned long sb_flags, + unsigned int pending); +int au_opts_mount(struct super_block *sb, struct au_opts *opts); +int au_opts_remount(struct super_block *sb, struct au_opts *opts); + +unsigned int au_opt_udba(struct super_block *sb); + +/* ---------------------------------------------------------------------- */ + +#endif /* __KERNEL__ */ +#endif /* __AUFS_OPTS_H__ */ diff -urN ./fs/aufs/plink.c ./fs/aufs/plink.c --- ./fs/aufs/plink.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/plink.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,335 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * pseudo-link + */ + +#include "aufs.h" + +/* + * during a user process maintains the pseudo-links, + * prohibit adding a new plink and branch manipulation. + */ +void au_plink_block_maintain(struct super_block *sb) +{ + struct au_sbinfo *sbi = au_sbi(sb); + /* gave up wake_up_bit() */ + wait_event(sbi->si_plink_wq, !au_ftest_si(sbi, MAINTAIN_PLINK)); +} + +/* ---------------------------------------------------------------------- */ + +struct pseudo_link { + struct list_head list; + struct inode *inode; +}; + +#ifdef CONFIG_AUFS_DEBUG +void au_plink_list(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + struct list_head *plink_list; + struct pseudo_link *plink; + + sbinfo = au_sbi(sb); + AuDebugOn(!au_opt_test(au_mntflags(sb), PLINK)); + + plink_list = &sbinfo->si_plink.head; + spin_lock(&sbinfo->si_plink.spin); + list_for_each_entry(plink, plink_list, list) + AuDbg("%lu\n", plink->inode->i_ino); + spin_unlock(&sbinfo->si_plink.spin); +} +#endif + +/* is the inode pseudo-linked? */ +int au_plink_test(struct inode *inode) +{ + int found; + struct au_sbinfo *sbinfo; + struct list_head *plink_list; + struct pseudo_link *plink; + + sbinfo = au_sbi(inode->i_sb); + AuDebugOn(!au_opt_test(au_mntflags(inode->i_sb), PLINK)); + + found = 0; + plink_list = &sbinfo->si_plink.head; + spin_lock(&sbinfo->si_plink.spin); + list_for_each_entry(plink, plink_list, list) + if (plink->inode == inode) { + found = 1; + break; + } + spin_unlock(&sbinfo->si_plink.spin); + return found; +} + +/* ---------------------------------------------------------------------- */ + +/* + * generate a name for plink. + * the file will be stored under AUFS_WH_PLINKDIR. + */ +/* 20 is max digits length of ulong 64 */ +#define PLINK_NAME_LEN ((20 + 1) * 2) + +static int plink_name(char *name, int len, struct inode *inode, + aufs_bindex_t bindex) +{ + int rlen; + struct inode *h_inode; + + h_inode = au_h_iptr(inode, bindex); + rlen = snprintf(name, len, "%lu.%lu", inode->i_ino, h_inode->i_ino); + return rlen; +} + +/* lookup the plink-ed @inode under the branch at @bindex */ +struct dentry *au_plink_lkup(struct inode *inode, aufs_bindex_t bindex) +{ + struct dentry *h_dentry, *h_parent; + struct au_branch *br; + struct inode *h_dir; + char a[PLINK_NAME_LEN]; + struct qstr tgtname = { + .name = a + }; + + br = au_sbr(inode->i_sb, bindex); + h_parent = br->br_wbr->wbr_plink; + h_dir = h_parent->d_inode; + tgtname.len = plink_name(a, sizeof(a), inode, bindex); + + /* always superio. */ + mutex_lock_nested(&h_dir->i_mutex, AuLsc_I_CHILD2); + h_dentry = au_sio_lkup_one(&tgtname, h_parent, br); + mutex_unlock(&h_dir->i_mutex); + return h_dentry; +} + +/* create a pseudo-link */ +static int do_whplink(struct qstr *tgt, struct dentry *h_parent, + struct dentry *h_dentry, struct au_branch *br) +{ + int err; + struct path h_path = { + .mnt = br->br_mnt + }; + struct inode *h_dir; + + h_dir = h_parent->d_inode; + again: + h_path.dentry = au_lkup_one(tgt, h_parent, br, /*nd*/NULL); + err = PTR_ERR(h_path.dentry); + if (IS_ERR(h_path.dentry)) + goto out; + + err = 0; + /* wh.plink dir is not monitored */ + if (h_path.dentry->d_inode + && h_path.dentry->d_inode != h_dentry->d_inode) { + err = vfsub_unlink(h_dir, &h_path, /*force*/0); + dput(h_path.dentry); + h_path.dentry = NULL; + if (!err) + goto again; + } + if (!err && !h_path.dentry->d_inode) + err = vfsub_link(h_dentry, h_dir, &h_path); + dput(h_path.dentry); + + out: + return err; +} + +struct do_whplink_args { + int *errp; + struct qstr *tgt; + struct dentry *h_parent; + struct dentry *h_dentry; + struct au_branch *br; +}; + +static void call_do_whplink(void *args) +{ + struct do_whplink_args *a = args; + *a->errp = do_whplink(a->tgt, a->h_parent, a->h_dentry, a->br); +} + +static int whplink(struct dentry *h_dentry, struct inode *inode, + aufs_bindex_t bindex, struct au_branch *br) +{ + int err, wkq_err; + struct au_wbr *wbr; + struct dentry *h_parent; + struct inode *h_dir; + char a[PLINK_NAME_LEN]; + struct qstr tgtname = { + .name = a + }; + + wbr = au_sbr(inode->i_sb, bindex)->br_wbr; + h_parent = wbr->wbr_plink; + h_dir = h_parent->d_inode; + tgtname.len = plink_name(a, sizeof(a), inode, bindex); + + /* always superio. */ + mutex_lock_nested(&h_dir->i_mutex, AuLsc_I_CHILD2); + if (!au_test_wkq(current)) { + struct do_whplink_args args = { + .errp = &err, + .tgt = &tgtname, + .h_parent = h_parent, + .h_dentry = h_dentry, + .br = br + }; + wkq_err = au_wkq_wait(call_do_whplink, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } else + err = do_whplink(&tgtname, h_parent, h_dentry, br); + mutex_unlock(&h_dir->i_mutex); + + return err; +} + +/* free a single plink */ +static void do_put_plink(struct pseudo_link *plink, int do_del) +{ + iput(plink->inode); + if (do_del) + list_del(&plink->list); + kfree(plink); +} + +/* + * create a new pseudo-link for @h_dentry on @bindex. + * the linked inode is held in aufs @inode. + */ +void au_plink_append(struct inode *inode, aufs_bindex_t bindex, + struct dentry *h_dentry) +{ + struct super_block *sb; + struct au_sbinfo *sbinfo; + struct list_head *plink_list; + struct pseudo_link *plink; + int found, err, cnt; + + sb = inode->i_sb; + sbinfo = au_sbi(sb); + AuDebugOn(!au_opt_test(au_mntflags(sb), PLINK)); + + err = 0; + cnt = 0; + found = 0; + plink_list = &sbinfo->si_plink.head; + spin_lock(&sbinfo->si_plink.spin); + list_for_each_entry(plink, plink_list, list) { + cnt++; + if (plink->inode == inode) { + found = 1; + break; + } + } + if (found) { + spin_unlock(&sbinfo->si_plink.spin); + return; + } + + plink = NULL; + if (!found) { + plink = kmalloc(sizeof(*plink), GFP_ATOMIC); + if (plink) { + plink->inode = au_igrab(inode); + list_add(&plink->list, plink_list); + cnt++; + } else + err = -ENOMEM; + } + spin_unlock(&sbinfo->si_plink.spin); + + if (!err) { + au_plink_block_maintain(sb); + err = whplink(h_dentry, inode, bindex, au_sbr(sb, bindex)); + } + + if (unlikely(cnt > AUFS_PLINK_WARN)) + AuWarn1("unexpectedly many pseudo links, %d\n", cnt); + if (unlikely(err)) { + AuWarn("err %d, damaged pseudo link.\n", err); + if (!found && plink) + do_put_plink(plink, /*do_del*/1); + } +} + +/* free all plinks */ +void au_plink_put(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + struct list_head *plink_list; + struct pseudo_link *plink, *tmp; + + sbinfo = au_sbi(sb); + AuDebugOn(!au_opt_test(au_mntflags(sb), PLINK)); + + plink_list = &sbinfo->si_plink.head; + /* no spin_lock since sbinfo is write-locked */ + list_for_each_entry_safe(plink, tmp, plink_list, list) + do_put_plink(plink, 0); + INIT_LIST_HEAD(plink_list); +} + +/* free the plinks on a branch specified by @br_id */ +void au_plink_half_refresh(struct super_block *sb, aufs_bindex_t br_id) +{ + struct au_sbinfo *sbinfo; + struct list_head *plink_list; + struct pseudo_link *plink, *tmp; + struct inode *inode; + aufs_bindex_t bstart, bend, bindex; + unsigned char do_put; + + sbinfo = au_sbi(sb); + AuDebugOn(!au_opt_test(au_mntflags(sb), PLINK)); + + plink_list = &sbinfo->si_plink.head; + /* no spin_lock since sbinfo is write-locked */ + list_for_each_entry_safe(plink, tmp, plink_list, list) { + do_put = 0; + inode = au_igrab(plink->inode); + ii_write_lock_child(inode); + bstart = au_ibstart(inode); + bend = au_ibend(inode); + if (bstart >= 0) { + for (bindex = bstart; bindex <= bend; bindex++) { + if (!au_h_iptr(inode, bindex) + || au_ii_br_id(inode, bindex) != br_id) + continue; + au_set_h_iptr(inode, bindex, NULL, 0); + do_put = 1; + break; + } + } else + do_put_plink(plink, 1); + + if (do_put) { + for (bindex = bstart; bindex <= bend; bindex++) + if (au_h_iptr(inode, bindex)) { + do_put = 0; + break; + } + if (do_put) + do_put_plink(plink, 1); + } + ii_write_unlock(inode); + iput(inode); + } +} diff -urN ./fs/aufs/rwsem.h ./fs/aufs/rwsem.h --- ./fs/aufs/rwsem.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/rwsem.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,52 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * simple read-write semaphore wrappers + */ + +#ifndef __AUFS_RWSEM_H__ +#define __AUFS_RWSEM_H__ + +#ifdef __KERNEL__ + +#include + +#define au_rwsem_destroy(rw) AuDebugOn(rwsem_is_locked(rw)) +#define AuRwMustNoWaiters(rw) AuDebugOn(!list_empty(&(rw)->wait_list)) + +#define AuSimpleLockRwsemFuncs(prefix, param, rwsem) \ +static inline void prefix##_read_lock(param) \ +{ down_read(rwsem); } \ +static inline void prefix##_write_lock(param) \ +{ down_write(rwsem); } \ +static inline int prefix##_read_trylock(param) \ +{ return down_read_trylock(rwsem); } \ +static inline int prefix##_write_trylock(param) \ +{ return down_write_trylock(rwsem); } +/* why is not _nested version defined */ +/* static inline void prefix##_read_trylock_nested(param, lsc) +{ down_write_trylock_nested(rwsem, lsc)); } +static inline void prefix##_write_trylock_nestd(param, lsc) +{ down_write_trylock_nested(rwsem, lsc); } */ + +#define AuSimpleUnlockRwsemFuncs(prefix, param, rwsem) \ +static inline void prefix##_read_unlock(param) \ +{ up_read(rwsem); } \ +static inline void prefix##_write_unlock(param) \ +{ up_write(rwsem); } \ +static inline void prefix##_downgrade_lock(param) \ +{ downgrade_write(rwsem); } + +#define AuSimpleRwsemFuncs(prefix, param, rwsem) \ + AuSimpleLockRwsemFuncs(prefix, param, rwsem) \ + AuSimpleUnlockRwsemFuncs(prefix, param, rwsem) + +#endif /* __KERNEL__ */ +#endif /* __AUFS_RWSEM_H__ */ diff -urN ./fs/aufs/sbinfo.c ./fs/aufs/sbinfo.c --- ./fs/aufs/sbinfo.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/sbinfo.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,192 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * superblock private data + */ + +#include "aufs.h" + +/* + * they are necessary regardless sysfs is disabled. + */ +void au_si_free(struct kobject *kobj) +{ + struct au_sbinfo *sbinfo; + struct super_block *sb; + + sbinfo = container_of(kobj, struct au_sbinfo, si_kobj); + AuDebugOn(!list_empty(&sbinfo->si_plink.head)); + + sb = sbinfo->si_sb; + si_write_lock(sb); + au_xino_clr(sb); + au_br_free(sbinfo); + kfree(sbinfo->si_branch); + mutex_destroy(&sbinfo->si_xib_mtx); + si_write_unlock(sb); + au_rwsem_destroy(&sbinfo->si_rwsem); + + kfree(sbinfo); +} + +int au_si_alloc(struct super_block *sb) +{ + int err; + struct au_sbinfo *sbinfo; + + err = -ENOMEM; + sbinfo = kmalloc(sizeof(*sbinfo), GFP_NOFS); + if (unlikely(!sbinfo)) + goto out; + + /* will be reallocated separately */ + sbinfo->si_branch = kzalloc(sizeof(*sbinfo->si_branch), GFP_NOFS); + if (unlikely(!sbinfo->si_branch)) + goto out_sbinfo; + + memset(&sbinfo->si_kobj, 0, sizeof(sbinfo->si_kobj)); + err = sysaufs_si_init(sbinfo); + if (unlikely(err)) + goto out_br; + + au_nwt_init(&sbinfo->si_nowait); + init_rwsem(&sbinfo->si_rwsem); + down_write(&sbinfo->si_rwsem); + sbinfo->si_generation = 0; + sbinfo->au_si_status = 0; + sbinfo->si_bend = -1; + sbinfo->si_last_br_id = 0; + + sbinfo->si_wbr_copyup = AuWbrCopyup_Def; + sbinfo->si_wbr_create = AuWbrCreate_Def; + sbinfo->si_wbr_copyup_ops = au_wbr_copyup_ops + AuWbrCopyup_Def; + sbinfo->si_wbr_create_ops = au_wbr_create_ops + AuWbrCreate_Def; + + sbinfo->si_mntflags = AuOpt_Def; + + sbinfo->si_xread = NULL; + sbinfo->si_xwrite = NULL; + sbinfo->si_xib = NULL; + mutex_init(&sbinfo->si_xib_mtx); + sbinfo->si_xib_buf = NULL; + sbinfo->si_xino_brid = -1; + /* leave si_xib_last_pindex and si_xib_next_bit */ + + sbinfo->si_rdcache = AUFS_RDCACHE_DEF * HZ; + sbinfo->si_dirwh = AUFS_DIRWH_DEF; + + au_spl_init(&sbinfo->si_plink); + init_waitqueue_head(&sbinfo->si_plink_wq); + + /* leave other members for sysaufs and si_mnt. */ + sbinfo->si_sb = sb; + sb->s_fs_info = sbinfo; + au_debug_sbinfo_init(sbinfo); + return 0; /* success */ + + out_br: + kfree(sbinfo->si_branch); + out_sbinfo: + kfree(sbinfo); + out: + return err; +} + +int au_sbr_realloc(struct au_sbinfo *sbinfo, int nbr) +{ + int err, sz; + struct au_branch **brp; + + err = -ENOMEM; + sz = sizeof(*brp) * (sbinfo->si_bend + 1); + if (unlikely(!sz)) + sz = sizeof(*brp); + brp = au_kzrealloc(sbinfo->si_branch, sz, sizeof(*brp) * nbr, GFP_NOFS); + if (brp) { + sbinfo->si_branch = brp; + err = 0; + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +unsigned int au_sigen_inc(struct super_block *sb) +{ + unsigned int gen; + + gen = ++au_sbi(sb)->si_generation; + au_update_digen(sb->s_root); + au_update_iigen(sb->s_root->d_inode); + sb->s_root->d_inode->i_version++; + return gen; +} + +aufs_bindex_t au_new_br_id(struct super_block *sb) +{ + aufs_bindex_t br_id; + int i; + struct au_sbinfo *sbinfo; + + sbinfo = au_sbi(sb); + for (i = 0; i <= AUFS_BRANCH_MAX; i++) { + br_id = ++sbinfo->si_last_br_id; + if (br_id && au_br_index(sb, br_id) < 0) + return br_id; + } + + return -1; +} + +/* ---------------------------------------------------------------------- */ + +/* dentry and super_block lock. call at entry point */ +void aufs_read_lock(struct dentry *dentry, int flags) +{ + si_read_lock(dentry->d_sb, flags); + if (au_ftest_lock(flags, DW)) + di_write_lock_child(dentry); + else + di_read_lock_child(dentry, flags); +} + +void aufs_read_unlock(struct dentry *dentry, int flags) +{ + if (au_ftest_lock(flags, DW)) + di_write_unlock(dentry); + else + di_read_unlock(dentry, flags); + si_read_unlock(dentry->d_sb); +} + +void aufs_write_lock(struct dentry *dentry) +{ + si_write_lock(dentry->d_sb); + di_write_lock_child(dentry); +} + +void aufs_write_unlock(struct dentry *dentry) +{ + di_write_unlock(dentry); + si_write_unlock(dentry->d_sb); +} + +void aufs_read_and_write_lock2(struct dentry *d1, struct dentry *d2, int flags) +{ + si_read_lock(d1->d_sb, flags); + di_write_lock2_child(d1, d2, au_ftest_lock(flags, DIR)); +} + +void aufs_read_and_write_unlock2(struct dentry *d1, struct dentry *d2) +{ + di_write_unlock2(d1, d2); + si_read_unlock(d1->d_sb); +} diff -urN ./fs/aufs/spl.h ./fs/aufs/spl.h --- ./fs/aufs/spl.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/spl.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,47 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * simple list protected by a spinlock + */ + +#ifndef __AUFS_SPL_H__ +#define __AUFS_SPL_H__ + +#ifdef __KERNEL__ + +#include + +struct au_splhead { + spinlock_t spin; + struct list_head head; +}; + +static inline void au_spl_init(struct au_splhead *spl) +{ + spin_lock_init(&spl->spin); + INIT_LIST_HEAD(&spl->head); +} + +static inline void au_spl_add(struct list_head *list, struct au_splhead *spl) +{ + spin_lock(&spl->spin); + list_add(list, &spl->head); + spin_unlock(&spl->spin); +} + +static inline void au_spl_del(struct list_head *list, struct au_splhead *spl) +{ + spin_lock(&spl->spin); + list_del(list); + spin_unlock(&spl->spin); +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_SPL_H__ */ diff -urN ./fs/aufs/super.c ./fs/aufs/super.c --- ./fs/aufs/super.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/super.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,854 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * mount and super_block operations + */ + +#include +#include +#include +#include "aufs.h" + +/* + * super_operations + */ +static struct inode *aufs_alloc_inode(struct super_block *sb __maybe_unused) +{ + struct au_icntnr *c; + + c = au_cache_alloc_icntnr(); + if (c) { + inode_init_once(&c->vfs_inode); + c->vfs_inode.i_version = 1; /* sigen(sb); */ + c->iinfo.ii_hinode = NULL; + return &c->vfs_inode; + } + return NULL; +} + +static void aufs_destroy_inode(struct inode *inode) +{ + au_iinfo_fin(inode); + au_cache_free_icntnr(container_of(inode, struct au_icntnr, vfs_inode)); +} + +struct inode *au_iget_locked(struct super_block *sb, ino_t ino) +{ + struct inode *inode; + int err; + + inode = iget_locked(sb, ino); + if (unlikely(!inode)) { + inode = ERR_PTR(-ENOMEM); + goto out; + } + if (!(inode->i_state & I_NEW)) + goto out; + + err = au_xigen_new(inode); + if (!err) + err = au_iinfo_init(inode); + if (!err) + inode->i_version++; + else { + iget_failed(inode); + inode = ERR_PTR(err); + } + + out: + /* never return NULL */ + AuDebugOn(!inode); + AuTraceErrPtr(inode); + return inode; +} + +/* lock free root dinfo */ +static int au_show_brs(struct seq_file *seq, struct super_block *sb) +{ + int err; + aufs_bindex_t bindex, bend; + struct path path; + struct au_hdentry *hd; + struct au_branch *br; + + err = 0; + bend = au_sbend(sb); + hd = au_di(sb->s_root)->di_hdentry; + for (bindex = 0; !err && bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + path.mnt = br->br_mnt; + path.dentry = hd[bindex].hd_dentry; + err = au_seq_path(seq, &path); + if (err > 0) + err = seq_printf(seq, "=%s", + au_optstr_br_perm(br->br_perm)); + if (!err && bindex != bend) + err = seq_putc(seq, ':'); + } + + return err; +} + +static void au_show_wbr_create(struct seq_file *m, int v, + struct au_sbinfo *sbinfo) +{ + const char *pat; + + seq_printf(m, ",create="); + pat = au_optstr_wbr_create(v); + switch (v) { + case AuWbrCreate_TDP: + case AuWbrCreate_RR: + case AuWbrCreate_MFS: + case AuWbrCreate_PMFS: + seq_printf(m, pat); + break; + case AuWbrCreate_MFSV: + seq_printf(m, /*pat*/"mfs:%lu", + sbinfo->si_wbr_mfs.mfs_expire / HZ); + break; + case AuWbrCreate_PMFSV: + seq_printf(m, /*pat*/"pmfs:%lu", + sbinfo->si_wbr_mfs.mfs_expire / HZ); + break; + case AuWbrCreate_MFSRR: + seq_printf(m, /*pat*/"mfsrr:%llu", + sbinfo->si_wbr_mfs.mfsrr_watermark); + break; + case AuWbrCreate_MFSRRV: + seq_printf(m, /*pat*/"mfsrr:%llu:%lu", + sbinfo->si_wbr_mfs.mfsrr_watermark, + sbinfo->si_wbr_mfs.mfs_expire / HZ); + break; + } +} + +static int au_show_xino(struct seq_file *seq, struct vfsmount *mnt) +{ +#ifdef CONFIG_SYSFS + return 0; +#else + int err; + const int len = sizeof(AUFS_XINO_FNAME) - 1; + aufs_bindex_t bindex, brid; + struct super_block *sb; + struct qstr *name; + struct file *f; + struct dentry *d, *h_root; + + err = 0; + sb = mnt->mnt_sb; + f = au_sbi(sb)->si_xib; + if (!f) + goto out; + + /* stop printing the default xino path on the first writable branch */ + h_root = NULL; + brid = au_xino_brid(sb); + if (brid >= 0) { + bindex = au_br_index(sb, brid); + h_root = au_di(sb->s_root)->di_hdentry[0 + bindex].hd_dentry; + } + d = f->f_dentry; + name = &d->d_name; + /* safe ->d_parent because the file is unlinked */ + if (d->d_parent == h_root + && name->len == len + && !memcmp(name->name, AUFS_XINO_FNAME, len)) + goto out; + + seq_puts(seq, ",xino="); + err = au_xino_path(seq, f); + + out: + return err; +#endif +} + +/* seq_file will re-call me in case of too long string */ +static int aufs_show_options(struct seq_file *m, struct vfsmount *mnt) +{ + int err, n; + unsigned int mnt_flags, v; + struct super_block *sb; + struct au_sbinfo *sbinfo; + +#define AuBool(name, str) do { \ + v = au_opt_test(mnt_flags, name); \ + if (v != au_opt_test(AuOpt_Def, name)) \ + seq_printf(m, ",%s" #str, v ? "" : "no"); \ +} while (0) + +#define AuStr(name, str) do { \ + v = mnt_flags & AuOptMask_##name; \ + if (v != (AuOpt_Def & AuOptMask_##name)) \ + seq_printf(m, "," #str "=%s", au_optstr_##str(v)); \ +} while (0) + + /* lock free root dinfo */ + sb = mnt->mnt_sb; + si_noflush_read_lock(sb); + sbinfo = au_sbi(sb); + seq_printf(m, ",si=%lx", sysaufs_si_id(sbinfo)); + + mnt_flags = au_mntflags(sb); + if (au_opt_test(mnt_flags, XINO)) { + err = au_show_xino(m, mnt); + if (unlikely(err)) + goto out; + } else + seq_puts(m, ",noxino"); + + AuBool(TRUNC_XINO, trunc_xino); + AuStr(UDBA, udba); + AuBool(PLINK, plink); + /* AuBool(DIRPERM1, dirperm1); */ + /* AuBool(REFROF, refrof); */ + + v = sbinfo->si_wbr_create; + if (v != AuWbrCreate_Def) + au_show_wbr_create(m, v, sbinfo); + + v = sbinfo->si_wbr_copyup; + if (v != AuWbrCopyup_Def) + seq_printf(m, ",cpup=%s", au_optstr_wbr_copyup(v)); + + v = au_opt_test(mnt_flags, ALWAYS_DIROPQ); + if (v != au_opt_test(AuOpt_Def, ALWAYS_DIROPQ)) + seq_printf(m, ",diropq=%c", v ? 'a' : 'w'); + + n = sbinfo->si_dirwh; + if (n != AUFS_DIRWH_DEF) + seq_printf(m, ",dirwh=%d", n); + + n = sbinfo->si_rdcache / HZ; + if (n != AUFS_RDCACHE_DEF) + seq_printf(m, ",rdcache=%d", n); + + AuBool(SUM, sum); + /* AuBool(SUM_W, wsum); */ + AuBool(WARN_PERM, warn_perm); + AuBool(VERBOSE, verbose); + + out: + /* be sure to print "br:" last */ + if (!sysaufs_brs) { + seq_puts(m, ",br:"); + au_show_brs(m, sb); + } + si_read_unlock(sb); + return 0; + +#undef Deleted +#undef AuBool +#undef AuStr +} + +/* ---------------------------------------------------------------------- */ + +/* sum mode which returns the summation for statfs(2) */ + +static u64 au_add_till_max(u64 a, u64 b) +{ + u64 old; + + old = a; + a += b; + if (old < a) + return a; + return ULLONG_MAX; +} + +static int au_statfs_sum(struct super_block *sb, struct kstatfs *buf) +{ + int err; + u64 blocks, bfree, bavail, files, ffree; + aufs_bindex_t bend, bindex, i; + unsigned char shared; + struct vfsmount *h_mnt; + struct super_block *h_sb; + + blocks = 0; + bfree = 0; + bavail = 0; + files = 0; + ffree = 0; + + err = 0; + bend = au_sbend(sb); + for (bindex = bend; bindex >= 0; bindex--) { + h_mnt = au_sbr_mnt(sb, bindex); + h_sb = h_mnt->mnt_sb; + shared = 0; + for (i = bindex + 1; !shared && i <= bend; i++) + shared = (au_sbr_sb(sb, i) == h_sb); + if (shared) + continue; + + /* sb->s_root for NFS is unreliable */ + err = vfs_statfs(h_mnt->mnt_root, buf); + if (unlikely(err)) + goto out; + + blocks = au_add_till_max(blocks, buf->f_blocks); + bfree = au_add_till_max(bfree, buf->f_bfree); + bavail = au_add_till_max(bavail, buf->f_bavail); + files = au_add_till_max(files, buf->f_files); + ffree = au_add_till_max(ffree, buf->f_ffree); + } + + buf->f_blocks = blocks; + buf->f_bfree = bfree; + buf->f_bavail = bavail; + buf->f_files = files; + buf->f_ffree = ffree; + + out: + return err; +} + +static int aufs_statfs(struct dentry *dentry, struct kstatfs *buf) +{ + int err; + struct super_block *sb; + + /* lock free root dinfo */ + sb = dentry->d_sb; + si_noflush_read_lock(sb); + if (!au_opt_test(au_mntflags(sb), SUM)) + /* sb->s_root for NFS is unreliable */ + err = vfs_statfs(au_sbr_mnt(sb, 0)->mnt_root, buf); + else + err = au_statfs_sum(sb, buf); + si_read_unlock(sb); + + if (!err) { + buf->f_type = AUFS_SUPER_MAGIC; + buf->f_namelen -= AUFS_WH_PFX_LEN; + memset(&buf->f_fsid, 0, sizeof(buf->f_fsid)); + } + /* buf->f_bsize = buf->f_blocks = buf->f_bfree = buf->f_bavail = -1; */ + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* try flushing the lower fs at aufs remount/unmount time */ + +static void au_fsync_br(struct super_block *sb) +{ + aufs_bindex_t bend, bindex; + int brperm; + struct au_branch *br; + struct super_block *h_sb; + + bend = au_sbend(sb); + for (bindex = 0; bindex < bend; bindex++) { + br = au_sbr(sb, bindex); + brperm = br->br_perm; + if (brperm == AuBrPerm_RR || brperm == AuBrPerm_RRWH) + continue; + h_sb = br->br_mnt->mnt_sb; + if (bdev_read_only(h_sb->s_bdev)) + continue; + + lockdep_off(); + down_write(&h_sb->s_umount); + shrink_dcache_sb(h_sb); + fsync_super(h_sb); + up_write(&h_sb->s_umount); + lockdep_on(); + } +} + +/* + * this IS NOT for super_operations. + * I guess it will be reverted someday. + */ +static void aufs_umount_begin(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + + sbinfo = au_sbi(sb); + if (!sbinfo) + return; + + si_write_lock(sb); + au_fsync_br(sb); + if (au_opt_test(au_mntflags(sb), PLINK)) + au_plink_put(sb); + if (sbinfo->si_wbr_create_ops->fin) + sbinfo->si_wbr_create_ops->fin(sb); + si_write_unlock(sb); +} + +/* final actions when unmounting a file system */ +static void aufs_put_super(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + + sbinfo = au_sbi(sb); + if (!sbinfo) + return; + + aufs_umount_begin(sb); + dbgaufs_si_fin(sbinfo); + kobject_put(&sbinfo->si_kobj); +} + +/* ---------------------------------------------------------------------- */ + +/* + * refresh dentry and inode at remount time. + */ +static int do_refresh(struct dentry *dentry, mode_t type, + unsigned int dir_flags) +{ + int err; + struct dentry *parent; + + di_write_lock_child(dentry); + parent = dget_parent(dentry); + di_read_lock_parent(parent, AuLock_IR); + + /* returns the number of positive dentries */ + err = au_refresh_hdentry(dentry, type); + if (err >= 0) { + struct inode *inode = dentry->d_inode; + err = au_refresh_hinode(inode, dentry); + if (!err && type == S_IFDIR) + au_reset_hinotify(inode, dir_flags); + } + if (unlikely(err)) + AuErr("unrecoverable error %d, %.*s\n", err, AuDLNPair(dentry)); + + di_read_unlock(parent, AuLock_IR); + dput(parent); + di_write_unlock(dentry); + + return err; +} + +static int test_dir(struct dentry *dentry, void *arg __maybe_unused) +{ + return S_ISDIR(dentry->d_inode->i_mode); +} + +/* gave up consolidating with refresh_nondir() */ +static int refresh_dir(struct dentry *root, unsigned int sigen) +{ + int err, i, j, ndentry, e; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + struct dentry **dentries; + struct inode *inode; + const unsigned int flags = au_hi_flags(root->d_inode, /*isdir*/1); + + err = 0; + list_for_each_entry(inode, &root->d_sb->s_inodes, i_sb_list) + if (S_ISDIR(inode->i_mode) && au_iigen(inode) != sigen) { + ii_write_lock_child(inode); + e = au_refresh_hinode_self(inode, /*do_attr*/1); + ii_write_unlock(inode); + if (unlikely(e)) { + AuDbg("e %d, i%lu\n", e, inode->i_ino); + if (!err) + err = e; + /* go on even if err */ + } + } + + e = au_dpages_init(&dpages, GFP_NOFS); + if (unlikely(e)) { + if (!err) + err = e; + goto out; + } + e = au_dcsub_pages(&dpages, root, test_dir, NULL); + if (unlikely(e)) { + if (!err) + err = e; + goto out_dpages; + } + + for (i = 0; !e && i < dpages.ndpage; i++) { + dpage = dpages.dpages + i; + dentries = dpage->dentries; + ndentry = dpage->ndentry; + for (j = 0; !e && j < ndentry; j++) { + struct dentry *d; + + d = dentries[j]; + au_dbg_verify_dir_parent(d, sigen); + if (au_digen(d) != sigen) { + e = do_refresh(d, S_IFDIR, flags); + if (unlikely(e && !err)) + err = e; + /* break on err */ + } + } + } + + out_dpages: + au_dpages_free(&dpages); + out: + return err; +} + +static int test_nondir(struct dentry *dentry, void *arg __maybe_unused) +{ + return !S_ISDIR(dentry->d_inode->i_mode); +} + +static int refresh_nondir(struct dentry *root, unsigned int sigen, + int do_dentry) +{ + int err, i, j, ndentry, e; + struct au_dcsub_pages dpages; + struct au_dpage *dpage; + struct dentry **dentries; + struct inode *inode; + + err = 0; + list_for_each_entry(inode, &root->d_sb->s_inodes, i_sb_list) + if (!S_ISDIR(inode->i_mode) && au_iigen(inode) != sigen) { + ii_write_lock_child(inode); + e = au_refresh_hinode_self(inode, /*do_attr*/1); + ii_write_unlock(inode); + if (unlikely(e)) { + AuDbg("e %d, i%lu\n", e, inode->i_ino); + if (!err) + err = e; + /* go on even if err */ + } + } + + if (!do_dentry) + goto out; + + e = au_dpages_init(&dpages, GFP_NOFS); + if (unlikely(e)) { + if (!err) + err = e; + goto out; + } + e = au_dcsub_pages(&dpages, root, test_nondir, NULL); + if (unlikely(e)) { + if (!err) + err = e; + goto out_dpages; + } + + for (i = 0; i < dpages.ndpage; i++) { + dpage = dpages.dpages + i; + dentries = dpage->dentries; + ndentry = dpage->ndentry; + for (j = 0; j < ndentry; j++) { + struct dentry *d; + + d = dentries[j]; + au_dbg_verify_nondir_parent(d, sigen); + inode = d->d_inode; + if (inode && au_digen(d) != sigen) { + e = do_refresh(d, inode->i_mode & S_IFMT, + /*dir_flags*/0); + if (unlikely(e && !err)) + err = e; + /* go on even err */ + } + } + } + + out_dpages: + au_dpages_free(&dpages); + out: + return err; +} + +static void au_remount_refresh(struct super_block *sb, unsigned int flags) +{ + int err; + unsigned int sigen; + struct au_sbinfo *sbinfo; + struct dentry *root; + struct inode *inode; + + au_sigen_inc(sb); + sigen = au_sigen(sb); + sbinfo = au_sbi(sb); + au_fclr_si(sbinfo, FAILED_REFRESH_DIRS); + + root = sb->s_root; + DiMustNoWaiters(root); + inode = root->d_inode; + IiMustNoWaiters(inode); + au_reset_hinotify(inode, au_hi_flags(inode, /*isdir*/1)); + di_write_unlock(root); + + err = refresh_dir(root, sigen); + if (unlikely(err)) { + au_fset_si(sbinfo, FAILED_REFRESH_DIRS); + AuWarn("Refreshing directories failed, ignored (%d)\n", err); + } + + if (au_ftest_opts(flags, REFRESH_NONDIR)) { + err = refresh_nondir(root, sigen, !err); + if (unlikely(err)) + AuWarn("Refreshing non-directories failed, ignored" + "(%d)\n", err); + } + + /* aufs_write_lock() calls ..._child() */ + di_write_lock_child(root); + au_cpup_attr_all(root->d_inode, /*force*/1); +} + +/* stop extra interpretation of errno in mount(8), and strange error messages */ +static int cvt_err(int err) +{ + AuTraceErr(err); + + switch (err) { + case -ENOENT: + case -ENOTDIR: + case -EEXIST: + case -EIO: + err = -EINVAL; + } + return err; +} + +static int aufs_remount_fs(struct super_block *sb, int *flags, char *data) +{ + int err; + struct au_opts opts; + struct dentry *root; + struct inode *inode; + struct au_sbinfo *sbinfo; + + err = 0; + root = sb->s_root; + if (!data || !*data) { + aufs_write_lock(root); + err = au_opts_verify(sb, *flags, /*pending*/0); + if (!err) + au_fsync_br(sb); + aufs_write_unlock(root); + goto out; + } + + err = -ENOMEM; + memset(&opts, 0, sizeof(opts)); + opts.opt = (void *)__get_free_page(GFP_NOFS); + if (unlikely(!opts.opt)) + goto out; + opts.max_opt = PAGE_SIZE / sizeof(*opts.opt); + opts.flags = AuOpts_REMOUNT; + opts.sb_flags = *flags; + + /* parse it before aufs lock */ + err = au_opts_parse(sb, data, &opts); + if (unlikely(err)) + goto out_opts; + + sbinfo = au_sbi(sb); + inode = root->d_inode; + mutex_lock(&inode->i_mutex); + aufs_write_lock(root); + au_fsync_br(sb); + + /* au_opts_remount() may return an error */ + err = au_opts_remount(sb, &opts); + au_opts_free(&opts); + + if (au_ftest_opts(opts.flags, REFRESH_DIR) + || au_ftest_opts(opts.flags, REFRESH_NONDIR)) + au_remount_refresh(sb, opts.flags); + + aufs_write_unlock(root); + mutex_unlock(&inode->i_mutex); + + out_opts: + free_page((unsigned long)opts.opt); + out: + err = cvt_err(err); + AuTraceErr(err); + return err; +} + +static struct super_operations aufs_sop = { + .alloc_inode = aufs_alloc_inode, + .destroy_inode = aufs_destroy_inode, + .drop_inode = generic_delete_inode, + .show_options = aufs_show_options, + .statfs = aufs_statfs, + .put_super = aufs_put_super, + .remount_fs = aufs_remount_fs +}; + +/* ---------------------------------------------------------------------- */ + +static int alloc_root(struct super_block *sb) +{ + int err; + struct inode *inode; + struct dentry *root; + + err = -ENOMEM; + inode = au_iget_locked(sb, AUFS_ROOT_INO); + err = PTR_ERR(inode); + if (IS_ERR(inode)) + goto out; + + inode->i_op = &aufs_dir_iop; + inode->i_fop = &aufs_dir_fop; + inode->i_mode = S_IFDIR; + inode->i_nlink = 2; + unlock_new_inode(inode); + + root = d_alloc_root(inode); + if (unlikely(!root)) + goto out_iput; + err = PTR_ERR(root); + if (IS_ERR(root)) + goto out_iput; + + err = au_alloc_dinfo(root); + if (!err) { + sb->s_root = root; + return 0; /* success */ + } + dput(root); + goto out; /* do not iput */ + + out_iput: + iget_failed(inode); + iput(inode); + out: + return err; + +} + +static int aufs_fill_super(struct super_block *sb, void *raw_data, + int silent __maybe_unused) +{ + int err; + struct au_opts opts; + struct dentry *root; + struct inode *inode; + char *arg = raw_data; + + if (unlikely(!arg || !*arg)) { + err = -EINVAL; + AuErr("no arg\n"); + goto out; + } + + err = -ENOMEM; + memset(&opts, 0, sizeof(opts)); + opts.opt = (void *)__get_free_page(GFP_NOFS); + if (unlikely(!opts.opt)) + goto out; + opts.max_opt = PAGE_SIZE / sizeof(*opts.opt); + opts.sb_flags = sb->s_flags; + + err = au_si_alloc(sb); + if (unlikely(err)) + goto out_opts; + + /* all timestamps always follow the ones on the branch */ + sb->s_flags |= MS_NOATIME | MS_NODIRATIME; + sb->s_op = &aufs_sop; + sb->s_magic = AUFS_SUPER_MAGIC; + sb->s_maxbytes = 0; + au_export_init(sb); + + err = alloc_root(sb); + if (unlikely(err)) { + si_write_unlock(sb); + goto out_info; + } + root = sb->s_root; + inode = root->d_inode; + + /* + * actually we can parse options regardless aufs lock here. + * but at remount time, parsing must be done before aufs lock. + * so we follow the same rule. + */ + ii_write_lock_parent(inode); + aufs_write_unlock(root); + err = au_opts_parse(sb, arg, &opts); + if (unlikely(err)) + goto out_root; + + /* lock vfs_inode first, then aufs. */ + mutex_lock(&inode->i_mutex); + inode->i_op = &aufs_dir_iop; + inode->i_fop = &aufs_dir_fop; + aufs_write_lock(root); + err = au_opts_mount(sb, &opts); + au_opts_free(&opts); + if (unlikely(err)) + goto out_unlock; + aufs_write_unlock(root); + mutex_unlock(&inode->i_mutex); + goto out_opts; /* success */ + + out_unlock: + aufs_write_unlock(root); + mutex_unlock(&inode->i_mutex); + out_root: + dput(root); + sb->s_root = NULL; + out_info: + kobject_put(&au_sbi(sb)->si_kobj); + sb->s_fs_info = NULL; + out_opts: + free_page((unsigned long)opts.opt); + out: + AuTraceErr(err); + err = cvt_err(err); + AuTraceErr(err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int aufs_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name __maybe_unused, void *raw_data, + struct vfsmount *mnt) +{ + int err; + struct super_block *sb; + + /* all timestamps always follow the ones on the branch */ + /* mnt->mnt_flags |= MNT_NOATIME | MNT_NODIRATIME; */ + err = get_sb_nodev(fs_type, flags, raw_data, aufs_fill_super, mnt); + if (!err) { + sb = mnt->mnt_sb; + si_write_lock(sb); + sysaufs_brs_add(sb, 0); + si_write_unlock(sb); + } + return err; +} + +struct file_system_type aufs_fs_type = { + .name = AUFS_FSTYPE, + .fs_flags = + FS_RENAME_DOES_D_MOVE /* a race between rename and others */ + | FS_REVAL_DOT, /* for NFS branch and udba */ + .get_sb = aufs_get_sb, + .kill_sb = generic_shutdown_super, + /* no need to __module_get() and module_put(). */ + .owner = THIS_MODULE, +}; diff -urN ./fs/aufs/super.h ./fs/aufs/super.h --- ./fs/aufs/super.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/super.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,349 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * super_block operations + */ + +#ifndef __AUFS_SUPER_H__ +#define __AUFS_SUPER_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include "rwsem.h" +#include "spl.h" +#include "wkq.h" + +typedef ssize_t (*au_readf_t)(struct file *, char __user *, size_t, loff_t *); +typedef ssize_t (*au_writef_t)(struct file *, const char __user *, size_t, + loff_t *); + +/* policies to select one among multiple writable branches */ +struct au_wbr_copyup_operations { + int (*copyup)(struct dentry *dentry); +}; + +struct au_wbr_create_operations { + int (*create)(struct dentry *dentry, int isdir); + int (*init)(struct super_block *sb); + int (*fin)(struct super_block *sb); +}; + +struct au_wbr_mfs { + struct mutex mfs_lock; /* protect this structure */ + unsigned long mfs_jiffy; + unsigned long mfs_expire; + aufs_bindex_t mfs_bindex; + + unsigned long long mfsrr_bytes; + unsigned long long mfsrr_watermark; +}; + +/* sbinfo status flags */ +/* + * set true when refresh_dirs() failed at remount time. + * then try refreshing dirs at access time again. + * if it is false, refreshing dirs at access time is unnecesary + */ +#define AuSi_FAILED_REFRESH_DIRS 1 +#define AuSi_MAINTAIN_PLINK (1 << 1) /* ioctl */ +#define au_ftest_si(sbinfo, name) ((sbinfo)->au_si_status & AuSi_##name) +#define au_fset_si(sbinfo, name) \ + { (sbinfo)->au_si_status |= AuSi_##name; } +#define au_fclr_si(sbinfo, name) \ + { (sbinfo)->au_si_status &= ~AuSi_##name; } + +struct au_branch; +struct au_sbinfo { + /* nowait tasks in the system-wide workqueue */ + struct au_nowait_tasks si_nowait; + + struct rw_semaphore si_rwsem; + + /* branch management */ + unsigned int si_generation; + + /* see above flags */ + unsigned char au_si_status; + + aufs_bindex_t si_bend; + aufs_bindex_t si_last_br_id; + struct au_branch **si_branch; + + /* policy to select a writable branch */ + unsigned char si_wbr_copyup; + unsigned char si_wbr_create; + struct au_wbr_copyup_operations *si_wbr_copyup_ops; + struct au_wbr_create_operations *si_wbr_create_ops; + + /* round robin */ + atomic_t si_wbr_rr_next; + + /* most free space */ + struct au_wbr_mfs si_wbr_mfs; + + /* mount flags */ + /* include/asm-ia64/siginfo.h defines a macro named si_flags */ + unsigned int si_mntflags; + + /* external inode number (bitmap and translation table) */ + au_readf_t si_xread; + au_writef_t si_xwrite; + struct file *si_xib; + struct mutex si_xib_mtx; /* protect xib members */ + unsigned long *si_xib_buf; + unsigned long si_xib_last_pindex; + int si_xib_next_bit; + aufs_bindex_t si_xino_brid; + /* reserved for future use */ + /* unsigned long long si_xib_limit; */ /* Max xib file size */ + +#ifdef CONFIG_AUFS_EXPORT + /* i_generation */ + struct file *si_xigen; + atomic_t si_xigen_next; +#endif + + /* readdir cache time, max, in HZ */ + unsigned long si_rdcache; + + /* + * If the number of whiteouts are larger than si_dirwh, leave all of + * them after au_whtmp_ren to reduce the cost of rmdir(2). + * future fsck.aufs or kernel thread will remove them later. + * Otherwise, remove all whiteouts and the dir in rmdir(2). + */ + unsigned int si_dirwh; + + /* + * rename(2) a directory with all children. + */ + /* reserved for future use */ + /* int si_rendir; */ + + /* pseudo_link list */ + struct au_splhead si_plink; + wait_queue_head_t si_plink_wq; + + /* + * sysfs and lifetime management. + * this is not a small structure and it may be a waste of memory in case + * of sysfs is disabled, particulary when many aufs-es are mounted. + * but using sysfs is majority. + */ + struct kobject si_kobj; +#ifdef CONFIG_DEBUG_FS + struct dentry *si_dbgaufs, *si_dbgaufs_xib; +#ifdef CONFIG_AUFS_EXPORT + struct dentry *si_dbgaufs_xigen; +#endif +#endif + + /* dirty, necessary for unmounting, sysfs and sysrq */ + struct super_block *si_sb; +}; + +/* ---------------------------------------------------------------------- */ + +/* policy to select one among writable branches */ +#define AuWbrCopyup(sbinfo, args...) \ + ((sbinfo)->si_wbr_copyup_ops->copyup(args)) +#define AuWbrCreate(sbinfo, args...) \ + ((sbinfo)->si_wbr_create_ops->create(args)) + +/* flags for si_read_lock()/aufs_read_lock()/di_read_lock() */ +#define AuLock_DW 1 /* write-lock dentry */ +#define AuLock_IR (1 << 1) /* read-lock inode */ +#define AuLock_IW (1 << 2) /* write-lock inode */ +#define AuLock_FLUSH (1 << 3) /* wait for 'nowait' tasks */ +#define AuLock_DIR (1 << 4) /* target is a dir */ +#define au_ftest_lock(flags, name) ((flags) & AuLock_##name) +#define au_fset_lock(flags, name) { (flags) |= AuLock_##name; } +#define au_fclr_lock(flags, name) { (flags) &= ~AuLock_##name; } + +/* ---------------------------------------------------------------------- */ + +/* super.c */ +extern struct file_system_type aufs_fs_type; +struct inode *au_iget_locked(struct super_block *sb, ino_t ino); + +/* sbinfo.c */ +void au_si_free(struct kobject *kobj); +int au_si_alloc(struct super_block *sb); +int au_sbr_realloc(struct au_sbinfo *sbinfo, int nbr); + +unsigned int au_sigen_inc(struct super_block *sb); +aufs_bindex_t au_new_br_id(struct super_block *sb); + +void aufs_read_lock(struct dentry *dentry, int flags); +void aufs_read_unlock(struct dentry *dentry, int flags); +void aufs_write_lock(struct dentry *dentry); +void aufs_write_unlock(struct dentry *dentry); +void aufs_read_and_write_lock2(struct dentry *d1, struct dentry *d2, int isdir); +void aufs_read_and_write_unlock2(struct dentry *d1, struct dentry *d2); + +/* wbr_policy.c */ +extern struct au_wbr_copyup_operations au_wbr_copyup_ops[]; +extern struct au_wbr_create_operations au_wbr_create_ops[]; +int au_cpdown_dirs(struct dentry *dentry, aufs_bindex_t bdst); + +/* ---------------------------------------------------------------------- */ + +static inline struct au_sbinfo *au_sbi(struct super_block *sb) +{ + return sb->s_fs_info; +} + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_EXPORT +void au_export_init(struct super_block *sb); + +static inline int au_test_nfsd(struct task_struct *tsk) +{ + return !tsk->mm && !strcmp(tsk->comm, "nfsd"); +} + +int au_xigen_inc(struct inode *inode); +int au_xigen_new(struct inode *inode); +int au_xigen_set(struct super_block *sb, struct file *base); +void au_xigen_clr(struct super_block *sb); + +static inline int au_busy_or_stale(void) +{ + if (!au_test_nfsd(current)) + return -EBUSY; + return -ESTALE; +} +#else +static inline void au_export_init(struct super_block *sb) +{ + /* nothing */ +} + +static inline int au_test_nfsd(struct task_struct *tsk) +{ + return 0; +} + +static inline int au_xigen_inc(struct inode *inode) +{ + return 0; +} + +static inline int au_xigen_new(struct inode *inode) +{ + return 0; +} + +static inline int au_xigen_set(struct super_block *sb, struct file *base) +{ + return 0; +} + +static inline void au_xigen_clr(struct super_block *sb) +{ + /* empty */ +} + +static inline int au_busy_or_stale(void) +{ + return -EBUSY; +} +#endif /* CONFIG_AUFS_EXPORT */ + +/* ---------------------------------------------------------------------- */ + +static inline void dbgaufs_si_null(struct au_sbinfo *sbinfo) +{ +#ifdef CONFIG_DEBUG_FS + sbinfo->si_dbgaufs = NULL; + sbinfo->si_dbgaufs_xib = NULL; +#ifdef CONFIG_AUFS_EXPORT + sbinfo->si_dbgaufs_xigen = NULL; +#endif +#endif +} + +/* ---------------------------------------------------------------------- */ + +/* lock superblock. mainly for entry point functions */ +/* + * si_noflush_read_lock, si_noflush_write_lock, + * si_read_unlock, si_write_unlock, si_downgrade_lock + */ +AuSimpleLockRwsemFuncs(si_noflush, struct super_block *sb, + &au_sbi(sb)->si_rwsem); +AuSimpleUnlockRwsemFuncs(si, struct super_block *sb, &au_sbi(sb)->si_rwsem); + +static inline void si_read_lock(struct super_block *sb, int flags) +{ + if (au_ftest_lock(flags, FLUSH)) + au_nwt_flush(&au_sbi(sb)->si_nowait); + si_noflush_read_lock(sb); +} + +static inline void si_write_lock(struct super_block *sb) +{ + au_nwt_flush(&au_sbi(sb)->si_nowait); + si_noflush_write_lock(sb); +} + +static inline int si_read_trylock(struct super_block *sb, int flags) +{ + if (au_ftest_lock(flags, FLUSH)) + au_nwt_flush(&au_sbi(sb)->si_nowait); + return si_noflush_read_trylock(sb); +} + +static inline int si_write_trylock(struct super_block *sb, int flags) +{ + if (au_ftest_lock(flags, FLUSH)) + au_nwt_flush(&au_sbi(sb)->si_nowait); + return si_noflush_write_trylock(sb); +} + +/* ---------------------------------------------------------------------- */ + +static inline aufs_bindex_t au_sbend(struct super_block *sb) +{ + return au_sbi(sb)->si_bend; +} + +static inline unsigned int au_mntflags(struct super_block *sb) +{ + return au_sbi(sb)->si_mntflags; +} + +static inline unsigned int au_sigen(struct super_block *sb) +{ + return au_sbi(sb)->si_generation; +} + +static inline struct au_branch *au_sbr(struct super_block *sb, + aufs_bindex_t bindex) +{ + return au_sbi(sb)->si_branch[0 + bindex]; +} + +static inline void au_xino_brid_set(struct super_block *sb, aufs_bindex_t brid) +{ + au_sbi(sb)->si_xino_brid = brid; +} + +static inline aufs_bindex_t au_xino_brid(struct super_block *sb) +{ + return au_sbi(sb)->si_xino_brid; +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_SUPER_H__ */ diff -urN ./fs/aufs/sysaufs.c ./fs/aufs/sysaufs.c --- ./fs/aufs/sysaufs.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/sysaufs.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,95 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sysfs interface and lifetime management + * they are necessary regardless sysfs is disabled. + */ + +#include +#include +#include +#include "aufs.h" + +unsigned long sysaufs_si_mask; +struct kset *sysaufs_ket; + +#define AuSiAttr(_name) { \ + .attr = { .name = __stringify(_name), .mode = 0444 }, \ + .show = sysaufs_si_##_name, \ +} + +static struct sysaufs_si_attr sysaufs_si_attr_xi_path = AuSiAttr(xi_path); +struct attribute *sysaufs_si_attrs[] = { + &sysaufs_si_attr_xi_path.attr, + NULL, +}; + +static struct sysfs_ops au_sbi_ops = { + .show = sysaufs_si_show +}; + +static struct kobj_type au_sbi_ktype = { + .release = au_si_free, + .sysfs_ops = &au_sbi_ops, + .default_attrs = sysaufs_si_attrs +}; + +/* ---------------------------------------------------------------------- */ + +int sysaufs_si_init(struct au_sbinfo *sbinfo) +{ + int err; + + sbinfo->si_kobj.kset = sysaufs_ket; + /* cf. sysaufs_name() */ + err = kobject_init_and_add + (&sbinfo->si_kobj, &au_sbi_ktype, /*&sysaufs_ket->kobj*/NULL, + SysaufsSiNamePrefix "%lx", sysaufs_si_id(sbinfo)); + + dbgaufs_si_null(sbinfo); + if (!err) { + err = dbgaufs_si_init(sbinfo); + if (unlikely(err)) + kobject_put(&sbinfo->si_kobj); + } + return err; +} + +void sysaufs_fin(void) +{ + dbgaufs_fin(); + sysfs_remove_group(&sysaufs_ket->kobj, sysaufs_attr_group); + kset_unregister(sysaufs_ket); +} + +int __init sysaufs_init(void) +{ + int err; + + do { + get_random_bytes(&sysaufs_si_mask, sizeof(sysaufs_si_mask)); + } while (!sysaufs_si_mask); + + sysaufs_ket = kset_create_and_add(AUFS_NAME, NULL, fs_kobj); + err = PTR_ERR(sysaufs_ket); + if (IS_ERR(sysaufs_ket)) + goto out; + err = sysfs_create_group(&sysaufs_ket->kobj, sysaufs_attr_group); + if (unlikely(err)) { + kset_unregister(sysaufs_ket); + goto out; + } + + err = dbgaufs_init(); + if (unlikely(err)) + sysaufs_fin(); + out: + return err; +} diff -urN ./fs/aufs/sysaufs.h ./fs/aufs/sysaufs.h --- ./fs/aufs/sysaufs.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/sysaufs.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,109 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sysfs interface and mount lifetime management + */ + +#ifndef __SYSAUFS_H__ +#define __SYSAUFS_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include "module.h" + +struct sysaufs_si_attr { + struct attribute attr; + int (*show)(struct seq_file *seq, struct super_block *sb); +}; + +/* ---------------------------------------------------------------------- */ + +/* sysaufs.c */ +extern unsigned long sysaufs_si_mask; +extern struct kset *sysaufs_ket; +extern struct attribute *sysaufs_si_attrs[]; +int sysaufs_si_init(struct au_sbinfo *sbinfo); +int __init sysaufs_init(void); +void sysaufs_fin(void); + +/* ---------------------------------------------------------------------- */ + +/* some people doesn't like to show a pointer in kernel */ +static inline unsigned long sysaufs_si_id(struct au_sbinfo *sbinfo) +{ + return sysaufs_si_mask ^ (unsigned long)sbinfo; +} + +#define SysaufsSiNamePrefix "si_" +#define SysaufsSiNameLen (sizeof(SysaufsSiNamePrefix) + 16) +static inline void sysaufs_name(struct au_sbinfo *sbinfo, char *name) +{ + snprintf(name, SysaufsSiNameLen, SysaufsSiNamePrefix "%lx", + sysaufs_si_id(sbinfo)); +} + +struct au_branch; +#ifdef CONFIG_SYSFS +/* sysfs.c */ +extern struct attribute_group *sysaufs_attr_group; + +int sysaufs_si_xi_path(struct seq_file *seq, struct super_block *sb); +ssize_t sysaufs_si_show(struct kobject *kobj, struct attribute *attr, + char *buf); + +void sysaufs_br_init(struct au_branch *br); +void sysaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex); +void sysaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex); + +#define sysaufs_brs_init() do {} while (0) + +#else +#define sysaufs_attr_group NULL + +static inline +int sysaufs_si_xi_path(struct seq_file *seq, struct super_block *sb) +{ + return 0; +} + +static inline +ssize_t sysaufs_si_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + return 0; +} + +static inline void sysaufs_br_init(struct au_branch *br) +{ + /* empty */ +} + +static inline void sysaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex) +{ + /* nothing */ +} + +static inline void sysaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex) +{ + /* nothing */ +} + +static inline void sysaufs_brs_init(void) +{ + sysaufs_brs = 0; +} + +#endif /* CONFIG_SYSFS */ + +#endif /* __KERNEL__ */ +#endif /* __SYSAUFS_H__ */ diff -urN ./fs/aufs/sysfs.c ./fs/aufs/sysfs.c --- ./fs/aufs/sysfs.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/sysfs.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,198 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sysfs interface + */ + +#include +#include +#include +#include "aufs.h" + +static struct attribute *au_attr[] = { + NULL, /* need to NULL terminate the list of attributes */ +}; + +static struct attribute_group sysaufs_attr_group_body = { + .attrs = au_attr +}; + +struct attribute_group *sysaufs_attr_group = &sysaufs_attr_group_body; + +/* ---------------------------------------------------------------------- */ + +int sysaufs_si_xi_path(struct seq_file *seq, struct super_block *sb) +{ + int err; + + err = 0; + if (au_opt_test(au_mntflags(sb), XINO)) { + err = au_xino_path(seq, au_sbi(sb)->si_xib); + seq_putc(seq, '\n'); + } + return err; +} + +/* + * the lifetime of branch is independent from the entry under sysfs. + * sysfs handles the lifetime of the entry, and never call ->show() after it is + * unlinked. + */ +static int sysaufs_si_br(struct seq_file *seq, struct super_block *sb, + aufs_bindex_t bindex) +{ + struct path path; + struct dentry *root; + struct au_branch *br; + + AuDbg("b%d\n", bindex); + + root = sb->s_root; + di_read_lock_parent(root, !AuLock_IR); + br = au_sbr(sb, bindex); + path.mnt = br->br_mnt; + path.dentry = au_h_dptr(root, bindex); + au_seq_path(seq, &path); + di_read_unlock(root, !AuLock_IR); + seq_printf(seq, "=%s\n", au_optstr_br_perm(br->br_perm)); + return 0; +} + +/* ---------------------------------------------------------------------- */ + +static struct seq_file *au_seq(char *p, ssize_t len) +{ + struct seq_file *seq; + + seq = kzalloc(sizeof(*seq), GFP_NOFS); + if (seq) { + /* mutex_init(&seq.lock); */ + seq->buf = p; + seq->size = len; + return seq; /* success */ + } + + seq = ERR_PTR(-ENOMEM); + return seq; +} + +#define SysaufsBr_PREFIX "br" + +/* todo: file size may exceed PAGE_SIZE */ +ssize_t sysaufs_si_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + ssize_t err; + long l; + aufs_bindex_t bend; + struct au_sbinfo *sbinfo; + struct super_block *sb; + struct seq_file *seq; + char *name; + struct attribute **cattr; + + sbinfo = container_of(kobj, struct au_sbinfo, si_kobj); + sb = sbinfo->si_sb; + si_noflush_read_lock(sb); + + seq = au_seq(buf, PAGE_SIZE); + err = PTR_ERR(seq); + if (IS_ERR(seq)) + goto out; + + name = (void *)attr->name; + cattr = sysaufs_si_attrs; + while (*cattr) { + if (!strcmp(name, (*cattr)->name)) { + err = container_of(*cattr, struct sysaufs_si_attr, attr) + ->show(seq, sb); + goto out_seq; + } + cattr++; + } + + bend = au_sbend(sb); + if (!strncmp(name, SysaufsBr_PREFIX, sizeof(SysaufsBr_PREFIX) - 1)) { + name += sizeof(SysaufsBr_PREFIX) - 1; + err = strict_strtol(name, 10, &l); + if (!err) { + if (l <= bend) + err = sysaufs_si_br(seq, sb, (aufs_bindex_t)l); + else + err = -ENOENT; + } + goto out_seq; + } + BUG(); + + out_seq: + if (!err) { + err = seq->count; + /* sysfs limit */ + if (unlikely(err == PAGE_SIZE)) + err = -EFBIG; + } + kfree(seq); + out: + si_read_unlock(sb); + return err; +} + +/* ---------------------------------------------------------------------- */ + +void sysaufs_br_init(struct au_branch *br) +{ + br->br_attr.name = br->br_name; + br->br_attr.mode = S_IRUGO; + br->br_attr.owner = THIS_MODULE; +} + +void sysaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex) +{ + struct au_branch *br; + struct kobject *kobj; + aufs_bindex_t bend; + + dbgaufs_brs_del(sb, bindex); + + if (!sysaufs_brs) + return; + + kobj = &au_sbi(sb)->si_kobj; + bend = au_sbend(sb); + for (; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + sysfs_remove_file(kobj, &br->br_attr); + } +} + +void sysaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex) +{ + int err; + aufs_bindex_t bend; + struct kobject *kobj; + struct au_branch *br; + + dbgaufs_brs_add(sb, bindex); + + if (!sysaufs_brs) + return; + + kobj = &au_sbi(sb)->si_kobj; + bend = au_sbend(sb); + for (; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + snprintf(br->br_name, sizeof(br->br_name), SysaufsBr_PREFIX + "%d", bindex); + err = sysfs_create_file(kobj, &br->br_attr); + if (unlikely(err)) + AuWarn("failed %s under sysfs(%d)\n", br->br_name, err); + } +} diff -urN ./fs/aufs/sysrq.c ./fs/aufs/sysrq.c --- ./fs/aufs/sysrq.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/sysrq.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,106 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * magic sysrq hanlder + */ + +#include +#include +#include +/* #include */ +#include "aufs.h" + +/* ---------------------------------------------------------------------- */ + +static void sysrq_sb(struct super_block *sb) +{ + char *plevel; + struct au_sbinfo *sbinfo; + struct file *file; + + plevel = au_plevel; + au_plevel = KERN_WARNING; + au_debug(1); + + sbinfo = au_sbi(sb); + pr_warning("si=%lx\n", sysaufs_si_id(sbinfo)); + pr_warning(AUFS_NAME ": superblock\n"); + au_dpri_sb(sb); + pr_warning(AUFS_NAME ": root dentry\n"); + au_dpri_dentry(sb->s_root); + pr_warning(AUFS_NAME ": root inode\n"); + au_dpri_inode(sb->s_root->d_inode); +#if 0 + struct inode *i; + pr_warning(AUFS_NAME ": isolated inode\n"); + list_for_each_entry(i, &sb->s_inodes, i_sb_list) + if (list_empty(&i->i_dentry)) + au_dpri_inode(i); +#endif + pr_warning(AUFS_NAME ": files\n"); + list_for_each_entry(file, &sb->s_files, f_u.fu_list) + if (!special_file(file->f_dentry->d_inode->i_mode)) + au_dpri_file(file); + + au_plevel = plevel; + au_debug(0); +} + +/* ---------------------------------------------------------------------- */ + +/* module parameter */ +static char *aufs_sysrq_key = "a"; +module_param_named(sysrq, aufs_sysrq_key, charp, S_IRUGO); +MODULE_PARM_DESC(sysrq, "MagicSysRq key for " AUFS_NAME); + +static void au_sysrq(int key __maybe_unused, + struct tty_struct *tty __maybe_unused) +{ + struct kobject *kobj; + struct au_sbinfo *sbinfo; + + /* spin_lock(&sysaufs_ket->list_lock); */ + list_for_each_entry(kobj, &sysaufs_ket->list, entry) { + sbinfo = container_of(kobj, struct au_sbinfo, si_kobj); + sysrq_sb(sbinfo->si_sb); + } + /* spin_unlock(&sysaufs_ket->list_lock); */ +} + +static struct sysrq_key_op au_sysrq_op = { + .handler = au_sysrq, + .help_msg = "Aufs", + .action_msg = "Aufs", + .enable_mask = SYSRQ_ENABLE_DUMP +}; + +/* ---------------------------------------------------------------------- */ + +int __init au_sysrq_init(void) +{ + int err; + char key; + + err = -1; + key = *aufs_sysrq_key; + if ('a' <= key && key <= 'z') + err = register_sysrq_key(key, &au_sysrq_op); + if (unlikely(err)) + AuErr("err %d, sysrq=%c\n", err, key); + return err; +} + +void au_sysrq_fin(void) +{ + int err; + err = unregister_sysrq_key(*aufs_sysrq_key, &au_sysrq_op); + if (unlikely(err)) + AuErr("err %d (ignored)\n", err); +} diff -urN ./fs/aufs/vdir.c ./fs/aufs/vdir.c --- ./fs/aufs/vdir.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/vdir.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,776 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * virtual or vertical directory + */ + +#include "aufs.h" + +static int calc_size(int namelen) +{ + int sz; + const int mask = sizeof(ino_t) - 1; + + BUILD_BUG_ON(sizeof(ino_t) != sizeof(long)); + + sz = sizeof(struct au_vdir_de) + namelen; + if (sz & mask) { + sz += sizeof(ino_t); + sz &= ~mask; + } + + AuDebugOn(sz % sizeof(ino_t)); + return sz; +} + +static int set_deblk_end(union au_vdir_deblk_p *p, + union au_vdir_deblk_p *deblk_end) +{ + if (calc_size(0) <= deblk_end->p - p->p) { + p->de->de_str.len = 0; + /* smp_mb(); */ + return 0; + } + return -1; /* error */ +} + +/* returns true or false */ +static int is_deblk_end(union au_vdir_deblk_p *p, + union au_vdir_deblk_p *deblk_end) +{ + if (calc_size(0) <= deblk_end->p - p->p) + return !p->de->de_str.len; + return 1; +} + +static au_vdir_deblk_t *last_deblk(struct au_vdir *vdir) +{ + return vdir->vd_deblk[vdir->vd_nblk - 1]; +} + +void au_nhash_init(struct au_nhash *nhash) +{ + int i; + struct hlist_head *heads; + + heads = nhash->heads; + for (i = 0; i < AuSize_NHASH; i++) + INIT_HLIST_HEAD(heads++); +} + +struct au_nhash *au_nhash_new(gfp_t gfp) +{ + struct au_nhash *nhash; + + nhash = kmalloc(sizeof(*nhash), gfp); + if (nhash) { + au_nhash_init(nhash); + return nhash; + } + return ERR_PTR(-ENOMEM); +} + +void au_nhash_del(struct au_nhash *nhash) +{ + au_nhash_fin(nhash); + kfree(nhash); +} + +void au_nhash_move(struct au_nhash *dst, struct au_nhash *src) +{ + int i; + struct hlist_head *dsth, *srch; + + *dst = *src; + srch = src->heads; + dsth = dst->heads; + for (i = 0; i < AuSize_NHASH; i++) { + if (dsth->first) + dsth->first->pprev = &dsth->first; + dsth++; + INIT_HLIST_HEAD(srch++); + } + /* smp_mb(); */ +} + +/* ---------------------------------------------------------------------- */ + +void au_nhash_fin(struct au_nhash *whlist) +{ + int i; + struct hlist_head *head; + struct au_vdir_wh *tpos; + struct hlist_node *pos, *n; + + head = whlist->heads; + for (i = 0; i < AuSize_NHASH; i++) { + hlist_for_each_entry_safe(tpos, pos, n, head, wh_hash) { + /* hlist_del(pos); */ + kfree(tpos); + } + head++; + } +} + +int au_nhash_test_longer_wh(struct au_nhash *whlist, aufs_bindex_t btgt, + int limit) +{ + int n, i; + struct hlist_head *head; + struct au_vdir_wh *tpos; + struct hlist_node *pos; + + n = 0; + head = whlist->heads; + for (i = 0; i < AuSize_NHASH; i++) { + hlist_for_each_entry(tpos, pos, head, wh_hash) + if (tpos->wh_bindex == btgt && ++n > limit) + return 1; + head++; + } + return 0; +} + +static unsigned int au_name_hash(const unsigned char *name, unsigned int len) +{ + return full_name_hash(name, len) % AuSize_NHASH; +} + +/* returns found or not */ +int au_nhash_test_known_wh(struct au_nhash *whlist, char *name, int namelen) +{ + struct hlist_head *head; + struct au_vdir_wh *tpos; + struct hlist_node *pos; + struct au_vdir_destr *str; + + head = whlist->heads + au_name_hash(name, namelen); + hlist_for_each_entry(tpos, pos, head, wh_hash) { + str = &tpos->wh_str; + AuDbg("%.*s\n", str->len, str->name); + if (str->len == namelen && !memcmp(str->name, name, namelen)) + return 1; + } + return 0; +} + +int au_nhash_append_wh(struct au_nhash *whlist, char *name, int namelen, + aufs_bindex_t bindex) +{ + int err; + struct au_vdir_destr *str; + struct au_vdir_wh *wh; + + err = -ENOMEM; + wh = kmalloc(sizeof(*wh) + namelen, GFP_NOFS); + if (unlikely(!wh)) + goto out; + + err = 0; + wh->wh_bindex = bindex; + str = &wh->wh_str; + str->len = namelen; + memcpy(str->name, name, namelen); + hlist_add_head(&wh->wh_hash, + whlist->heads + au_name_hash(name, namelen)); + /* smp_mb(); */ + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +void au_vdir_free(struct au_vdir *vdir) +{ + au_vdir_deblk_t **deblk; + + deblk = vdir->vd_deblk; + while (vdir->vd_nblk--) + kfree(*deblk++); + kfree(vdir->vd_deblk); + au_cache_free_vdir(vdir); +} + +static int append_deblk(struct au_vdir *vdir) +{ + int err, sz, i; + au_vdir_deblk_t **o; + union au_vdir_deblk_p p, deblk_end; + + err = -ENOMEM; + sz = sizeof(*o) * vdir->vd_nblk; + o = au_kzrealloc(vdir->vd_deblk, sz, sz + sizeof(*o), GFP_NOFS); + if (unlikely(!o)) + goto out; + + vdir->vd_deblk = o; + p.deblk = kmalloc(sizeof(*p.deblk), GFP_NOFS); + if (p.deblk) { + i = vdir->vd_nblk++; + vdir->vd_deblk[i] = p.deblk; + vdir->vd_last.i = i; + vdir->vd_last.p.p = p.p; + deblk_end.deblk = p.deblk + 1; + err = set_deblk_end(&p, &deblk_end); + } + + out: + return err; +} + +static struct au_vdir *alloc_vdir(void) +{ + struct au_vdir *vdir; + int err; + + err = -ENOMEM; + vdir = au_cache_alloc_vdir(); + if (unlikely(!vdir)) + goto out; + + vdir->vd_deblk = kzalloc(sizeof(*vdir->vd_deblk), GFP_NOFS); + if (unlikely(!vdir->vd_deblk)) + goto out_free; + + vdir->vd_nblk = 0; + vdir->vd_version = 0; + vdir->vd_jiffy = 0; + err = append_deblk(vdir); + if (!err) + return vdir; /* success */ + + kfree(vdir->vd_deblk); + + out_free: + au_cache_free_vdir(vdir); + out: + vdir = ERR_PTR(err); + return vdir; +} + +static int reinit_vdir(struct au_vdir *vdir) +{ + int err; + union au_vdir_deblk_p p, deblk_end; + + while (vdir->vd_nblk > 1) { + kfree(vdir->vd_deblk[vdir->vd_nblk - 1]); + vdir->vd_deblk[vdir->vd_nblk - 1] = NULL; + vdir->vd_nblk--; + } + p.deblk = vdir->vd_deblk[0]; + deblk_end.deblk = p.deblk + 1; + err = set_deblk_end(&p, &deblk_end); + vdir->vd_version = 0; + vdir->vd_jiffy = 0; + vdir->vd_last.i = 0; + vdir->vd_last.p.deblk = vdir->vd_deblk[0]; + /* smp_mb(); */ + return err; +} + +/* ---------------------------------------------------------------------- */ + +static void free_dehlist(struct au_nhash *dehlist) +{ + int i; + struct hlist_head *head; + struct au_vdir_dehstr *tpos; + struct hlist_node *pos, *n; + + head = dehlist->heads; + for (i = 0; i < AuSize_NHASH; i++) { + hlist_for_each_entry_safe(tpos, pos, n, head, hash) { + /* hlist_del(pos); */ + au_cache_free_dehstr(tpos); + } + head++; + } +} + +/* returns found(true) or not */ +static int test_known(struct au_nhash *delist, char *name, int namelen) +{ + struct hlist_head *head; + struct au_vdir_dehstr *tpos; + struct hlist_node *pos; + struct au_vdir_destr *str; + + head = delist->heads + au_name_hash(name, namelen); + hlist_for_each_entry(tpos, pos, head, hash) { + str = tpos->str; + AuDbg("%.*s\n", str->len, str->name); + if (str->len == namelen && !memcmp(str->name, name, namelen)) + return 1; + } + return 0; + +} + +static int append_de(struct au_vdir *vdir, char *name, int namelen, ino_t ino, + unsigned int d_type, struct au_nhash *delist) +{ + int err, sz; + union au_vdir_deblk_p p, *room, deblk_end; + struct au_vdir_dehstr *dehstr; + + p.deblk = last_deblk(vdir); + deblk_end.deblk = p.deblk + 1; + room = &vdir->vd_last.p; + AuDebugOn(room->p < p.p || deblk_end.p <= room->p + || !is_deblk_end(room, &deblk_end)); + + sz = calc_size(namelen); + if (unlikely(sz > deblk_end.p - room->p)) { + err = append_deblk(vdir); + if (unlikely(err)) + goto out; + + p.deblk = last_deblk(vdir); + deblk_end.deblk = p.deblk + 1; + /* smp_mb(); */ + AuDebugOn(room->p != p.p); + } + + err = -ENOMEM; + dehstr = au_cache_alloc_dehstr(); + if (unlikely(!dehstr)) + goto out; + + dehstr->str = &room->de->de_str; + hlist_add_head(&dehstr->hash, + delist->heads + au_name_hash(name, namelen)); + room->de->de_ino = ino; + room->de->de_type = d_type; + room->de->de_str.len = namelen; + memcpy(room->de->de_str.name, name, namelen); + + err = 0; + room->p += sz; + if (unlikely(set_deblk_end(room, &deblk_end))) + err = append_deblk(vdir); + /* smp_mb(); */ + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +static int au_ino(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + unsigned int d_type, ino_t *ino) +{ + int err; + struct mutex *mtx; + const int isdir = (d_type == DT_DIR); + + /* prevent hardlinks from race condition */ + mtx = NULL; + if (!isdir) { + mtx = &au_sbr(sb, bindex)->br_xino.xi_nondir_mtx; + mutex_lock(mtx); + } + err = au_xino_read(sb, bindex, h_ino, ino); + if (unlikely(err)) + goto out; + + if (!*ino) { + err = -EIO; + *ino = au_xino_new_ino(sb); + if (unlikely(!*ino)) + goto out; + err = au_xino_write(sb, bindex, h_ino, *ino); + if (unlikely(err)) + goto out; + } + + out: + if (!isdir) + mutex_unlock(mtx); + return err; +} + +#define AuFillVdir_CALLED 1 +#define au_ftest_fillvdir(flags, name) ((flags) & AuFillVdir_##name) +#define au_fset_fillvdir(flags, name) { (flags) |= AuFillVdir_##name; } +#define au_fclr_fillvdir(flags, name) { (flags) &= ~AuFillVdir_##name; } + +struct fillvdir_arg { + struct file *file; + struct au_vdir *vdir; + struct au_nhash *delist; + struct au_nhash *whlist; + aufs_bindex_t bindex; + unsigned int flags; + int err; +}; + +static int fillvdir(void *__arg, const char *__name, int namelen, + loff_t offset __maybe_unused, u64 h_ino, + unsigned int d_type) +{ + struct fillvdir_arg *arg = __arg; + char *name = (void *)__name; + struct super_block *sb; + struct au_nhash *delist, *whlist; + ino_t ino; + aufs_bindex_t bindex, bend; + + bend = arg->bindex; + arg->err = 0; + au_fset_fillvdir(arg->flags, CALLED); + /* smp_mb(); */ + if (namelen <= AUFS_WH_PFX_LEN + || memcmp(name, AUFS_WH_PFX, AUFS_WH_PFX_LEN)) { + delist = arg->delist; + for (bindex = 0; bindex < bend; bindex++) + if (test_known(delist++, name, namelen) + || au_nhash_test_known_wh(arg->whlist + bindex, + name, namelen)) + goto out; /* already exists or whiteouted */ + + ino = 1; /* why does gcc warn? */ + sb = arg->file->f_dentry->d_sb; + arg->err = au_ino(sb, bend, h_ino, d_type, &ino); + if (!arg->err) + arg->err = append_de(arg->vdir, name, namelen, ino, + d_type, arg->delist + bend); + } else { + name += AUFS_WH_PFX_LEN; + namelen -= AUFS_WH_PFX_LEN; + whlist = arg->whlist; + for (bindex = 0; bindex < bend; bindex++) + if (au_nhash_test_known_wh(whlist++, name, namelen)) + goto out; /* already whiteouted */ + + ino = 1; /* dummy */ + if (!arg->err) + arg->err = au_nhash_append_wh + (arg->whlist + bend, name, namelen, bend); + } + + out: + if (!arg->err) + arg->vdir->vd_jiffy = jiffies; + /* smp_mb(); */ + AuTraceErr(arg->err); + return arg->err; +} + +static int au_do_read_vdir(struct fillvdir_arg *arg) +{ + int err; + loff_t offset; + aufs_bindex_t bend, bindex, bstart; + struct file *hf, *file; + struct au_nhash *delist, *whlist; + + err = -ENOMEM; + bend = au_fbend(arg->file); + arg->delist = kmalloc(sizeof(*arg->delist) * (bend + 1), GFP_NOFS); + if (unlikely(!arg->delist)) + goto out; + arg->whlist = kmalloc(sizeof(*arg->whlist) * (bend + 1), GFP_NOFS); + if (unlikely(!arg->whlist)) + goto out_delist; + + err = 0; + delist = arg->delist; + whlist = arg->whlist; + for (bindex = 0; bindex <= bend; bindex++) { + au_nhash_init(delist++); + au_nhash_init(whlist++); + } + + arg->flags = 0; + file = arg->file; + bstart = au_fbstart(file); + for (bindex = bstart; !err && bindex <= bend; bindex++) { + hf = au_h_fptr(file, bindex); + if (!hf) + continue; + + offset = vfsub_llseek(hf, 0, SEEK_SET); + err = offset; + if (unlikely(offset)) + break; + + arg->bindex = bindex; + do { + arg->err = 0; + au_fclr_fillvdir(arg->flags, CALLED); + /* smp_mb(); */ + err = vfsub_readdir(hf, fillvdir, arg); + if (err >= 0) + err = arg->err; + } while (!err && au_ftest_fillvdir(arg->flags, CALLED)); + } + + delist = arg->delist + bstart; + whlist = arg->whlist + bstart; + for (bindex = bstart; bindex <= bend; bindex++) { + free_dehlist(delist++); + au_nhash_fin(whlist++); + } + kfree(arg->whlist); + + out_delist: + kfree(arg->delist); + out: + return err; +} + +static int read_vdir(struct file *file, int may_read) +{ + int err; + unsigned long expire; + unsigned char do_read; + struct fillvdir_arg arg; + struct inode *inode; + struct au_vdir *vdir, *allocated; + + err = 0; + inode = file->f_dentry->d_inode; + IMustLock(inode); + allocated = NULL; + do_read = 0; + expire = au_sbi(inode->i_sb)->si_rdcache; + vdir = au_ivdir(inode); + if (!vdir) { + do_read = 1; + vdir = alloc_vdir(); + err = PTR_ERR(vdir); + if (IS_ERR(vdir)) + goto out; + err = 0; + allocated = vdir; + } else if (may_read + && (inode->i_version != vdir->vd_version + || time_after(jiffies, vdir->vd_jiffy + expire))) { + do_read = 1; + err = reinit_vdir(vdir); + if (unlikely(err)) + goto out; + } + + if (!do_read) + return 0; /* success */ + + arg.file = file; + arg.vdir = vdir; + err = au_do_read_vdir(&arg); + if (!err) { + /* file->f_pos = 0; */ + vdir->vd_version = inode->i_version; + vdir->vd_last.i = 0; + vdir->vd_last.p.deblk = vdir->vd_deblk[0]; + if (allocated) + au_set_ivdir(inode, allocated); + } else if (allocated) + au_vdir_free(allocated); + + out: + return err; +} + +static int copy_vdir(struct au_vdir *tgt, struct au_vdir *src) +{ + int err, i, rerr, n; + + AuDebugOn(tgt->vd_nblk != 1); + + err = -ENOMEM; + if (tgt->vd_nblk < src->vd_nblk) { + au_vdir_deblk_t **p; + + p = au_kzrealloc(tgt->vd_deblk, sizeof(*p) * tgt->vd_nblk, + sizeof(*p) * src->vd_nblk, GFP_NOFS); + if (unlikely(!p)) + goto out; + tgt->vd_deblk = p; + } + + tgt->vd_nblk = src->vd_nblk; + n = src->vd_nblk; + memcpy(tgt->vd_deblk[0], src->vd_deblk[0], AuSize_DEBLK); + /* tgt->vd_last.i = 0; */ + /* tgt->vd_last.p.deblk = tgt->vd_deblk[0]; */ + tgt->vd_version = src->vd_version; + tgt->vd_jiffy = src->vd_jiffy; + + for (i = 1; i < n; i++) { + tgt->vd_deblk[i] = kmalloc(AuSize_DEBLK, GFP_NOFS); + if (tgt->vd_deblk[i]) + memcpy(tgt->vd_deblk[i], src->vd_deblk[i], + AuSize_DEBLK); + else + goto out; + } + /* smp_mb(); */ + return 0; /* success */ + + out: + rerr = reinit_vdir(tgt); + BUG_ON(rerr); + return err; +} + +int au_vdir_init(struct file *file) +{ + int err; + struct inode *inode; + struct au_vdir *vdir_cache, *allocated; + + err = read_vdir(file, !file->f_pos); + if (unlikely(err)) + goto out; + + allocated = NULL; + vdir_cache = au_fvdir_cache(file); + if (!vdir_cache) { + vdir_cache = alloc_vdir(); + err = PTR_ERR(vdir_cache); + if (IS_ERR(vdir_cache)) + goto out; + allocated = vdir_cache; + } else if (!file->f_pos && vdir_cache->vd_version != file->f_version) { + err = reinit_vdir(vdir_cache); + if (unlikely(err)) + goto out; + } else + return 0; /* success */ + + inode = file->f_dentry->d_inode; + err = copy_vdir(vdir_cache, au_ivdir(inode)); + if (!err) { + file->f_version = inode->i_version; + if (allocated) + au_set_fvdir_cache(file, allocated); + } else if (allocated) + au_vdir_free(allocated); + + out: + return err; +} + +static loff_t calc_offset(struct au_vdir *vdir) +{ + loff_t offset; + union au_vdir_deblk_p p; + + p.deblk = vdir->vd_deblk[vdir->vd_last.i]; + offset = vdir->vd_last.p.p - p.p; + offset += sizeof(*p.deblk) * vdir->vd_last.i; + return offset; +} + +/* returns true or false */ +static int seek_vdir(struct file *file) +{ + int valid, i, n; + loff_t offset; + union au_vdir_deblk_p p, deblk_end; + struct au_vdir *vdir_cache; + + valid = 1; + vdir_cache = au_fvdir_cache(file); + offset = calc_offset(vdir_cache); + AuDbg("offset %lld\n", offset); + if (file->f_pos == offset) + goto out; + + vdir_cache->vd_last.i = 0; + vdir_cache->vd_last.p.deblk = vdir_cache->vd_deblk[0]; + if (!file->f_pos) + goto out; + + valid = 0; + i = file->f_pos / AuSize_DEBLK; + AuDbg("i %d\n", i); + if (i >= vdir_cache->vd_nblk) + goto out; + + n = vdir_cache->vd_nblk; + for (; i < n; i++) { + p.deblk = vdir_cache->vd_deblk[i]; + deblk_end.deblk = p.deblk + 1; + offset = i; + offset *= AuSize_DEBLK; + while (!is_deblk_end(&p, &deblk_end) && offset < file->f_pos) { + int l; + + l = calc_size(p.de->de_str.len); + offset += l; + p.p += l; + } + if (!is_deblk_end(&p, &deblk_end)) { + valid = 1; + vdir_cache->vd_last.i = i; + vdir_cache->vd_last.p = p; + break; + } + } + + out: + /* smp_mb(); */ + AuTraceErr(!valid); + return valid; +} + +int au_vdir_fill_de(struct file *file, void *dirent, filldir_t filldir) +{ + int err, l; + union au_vdir_deblk_p deblk_end; + struct au_vdir *vdir_cache; + struct au_vdir_de *de; + + BUILD_BUG_ON(AuSize_DEBLK < NAME_MAX || PAGE_SIZE < AuSize_DEBLK); + + vdir_cache = au_fvdir_cache(file); + if (!seek_vdir(file)) + return 0; + + while (1) { + deblk_end.deblk + = vdir_cache->vd_deblk[vdir_cache->vd_last.i] + 1; + while (!is_deblk_end(&vdir_cache->vd_last.p, &deblk_end)) { + de = vdir_cache->vd_last.p.de; + AuDbg("%.*s, off%lld, i%lu, dt%d\n", + de->de_str.len, de->de_str.name, + file->f_pos, (unsigned long)de->de_ino, + de->de_type); + err = filldir(dirent, de->de_str.name, de->de_str.len, + file->f_pos, de->de_ino, de->de_type); + if (unlikely(err)) { + AuTraceErr(err); + /* todo: ignore the error caused by udba? */ + /* return err; */ + return 0; + } + + l = calc_size(de->de_str.len); + vdir_cache->vd_last.p.p += l; + file->f_pos += l; + } + if (vdir_cache->vd_last.i < vdir_cache->vd_nblk - 1) { + vdir_cache->vd_last.i++; + vdir_cache->vd_last.p.deblk + = vdir_cache->vd_deblk[vdir_cache->vd_last.i]; + file->f_pos = sizeof(*vdir_cache->vd_last.p.deblk) + * vdir_cache->vd_last.i; + continue; + } + break; + } + + /* smp_mb(); */ + return 0; +} diff -urN ./fs/aufs/vfsub.c ./fs/aufs/vfsub.c --- ./fs/aufs/vfsub.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/vfsub.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,731 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sub-routines for VFS + */ + +#include +#include "aufs.h" + +int vfsub_update_h_iattr(struct path *h_path, int *did) +{ + int err; + struct kstat st; + struct super_block *h_sb; + + /* for remote fs, leave work for its getattr or d_revalidate */ + /* for bad i_attr fs, handle them in aufs_getattr() */ + /* still some fs may acquire i_mutex. we need to skip them */ + err = 0; + if (!did) + did = &err; + h_sb = h_path->dentry->d_sb; + *did = (!au_test_fs_remote(h_sb) && au_test_fs_refresh_iattr(h_sb)); + if (*did) + err = vfs_getattr(h_path->mnt, h_path->dentry, &st); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct file *vfsub_filp_open(const char *path, int oflags, int mode) +{ + struct file *file; + + lockdep_off(); + file = filp_open(path, oflags, mode); + lockdep_on(); + if (IS_ERR(file)) + goto out; + AuDebugOn(!file->f_op); + vfsub_update_h_iattr(&file->f_path, /*did*/NULL); /*ignore*/ + + out: + return file; +} + +int vfsub_kern_path(const char *name, unsigned int flags, struct path *path) +{ + int err; + + /* lockdep_off(); */ + err = kern_path(name, flags, path); + /* lockdep_on(); */ + if (!err && path->dentry->d_inode) + vfsub_update_h_iattr(path, /*did*/NULL); /*ignore*/ + return err; +} + +struct dentry *vfsub_lookup_one_len(const char *name, struct dentry *parent, + int len) +{ + struct path path = { + .mnt = NULL + }; + + IMustLock(parent->d_inode); + + path.dentry = lookup_one_len(name, parent, len); + if (IS_ERR(path.dentry)) + goto out; + if (path.dentry->d_inode) + vfsub_update_h_iattr(&path, /*did*/NULL); /*ignore*/ + + out: + return path.dentry; +} + +struct dentry *vfsub_lookup_hash(struct nameidata *nd) +{ + struct path path = { + .mnt = nd->path.mnt + }; + + IMustLock(nd->path.dentry->d_inode); + + path.dentry = lookup_hash(nd); + if (!IS_ERR(path.dentry) && path.dentry->d_inode) + vfsub_update_h_iattr(&path, /*did*/NULL); /*ignore*/ + + return path.dentry; +} + +/* ---------------------------------------------------------------------- */ + +struct dentry *vfsub_lock_rename(struct dentry *d1, struct au_hinode *hdir1, + struct dentry *d2, struct au_hinode *hdir2) +{ + struct dentry *d; + + lockdep_off(); + d = lock_rename(d1, d2); + lockdep_on(); + au_hin_suspend(hdir1); + if (hdir1 != hdir2) + au_hin_suspend(hdir2); + + return d; +} + +void vfsub_unlock_rename(struct dentry *d1, struct au_hinode *hdir1, + struct dentry *d2, struct au_hinode *hdir2) +{ + au_hin_resume(hdir1); + if (hdir1 != hdir2) + au_hin_resume(hdir2); + lockdep_off(); + unlock_rename(d1, d2); + lockdep_on(); +} + +/* ---------------------------------------------------------------------- */ + +int vfsub_create(struct inode *dir, struct path *path, int mode) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_mknod(path, path->dentry, mode, 0); + path->dentry = d; + if (unlikely(err)) + goto out; + + if (au_test_fs_null_nd(dir->i_sb)) + err = vfs_create(dir, path->dentry, mode, NULL); + else { + struct nameidata h_nd; + + memset(&h_nd, 0, sizeof(h_nd)); + h_nd.flags = LOOKUP_CREATE; + h_nd.intent.open.flags = O_CREAT + | vfsub_fmode_to_uint(FMODE_READ); + h_nd.intent.open.create_mode = mode; + h_nd.path.dentry = path->dentry->d_parent; + h_nd.path.mnt = path->mnt; + path_get(&h_nd.path); + err = vfs_create(dir, path->dentry, mode, &h_nd); + path_put(&h_nd.path); + } + + if (!err) { + struct path tmp = *path; + int did; + + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = path->dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +int vfsub_symlink(struct inode *dir, struct path *path, const char *symname) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_symlink(path, path->dentry, symname); + path->dentry = d; + if (unlikely(err)) + goto out; + + err = vfs_symlink(dir, path->dentry, symname); + if (!err) { + struct path tmp = *path; + int did; + + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = path->dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +int vfsub_mknod(struct inode *dir, struct path *path, int mode, dev_t dev) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_mknod(path, path->dentry, mode, dev); + path->dentry = d; + if (unlikely(err)) + goto out; + + err = vfs_mknod(dir, path->dentry, mode, dev); + if (!err) { + struct path tmp = *path; + int did; + + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = path->dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +/* ramfs doesn't check i_nlink in link(2), aufs sets a limit for it */ +static int au_test_nlink(struct inode *inode) +{ +#ifndef CONFIG_AUFS_BR_RAMFS + return 0; +#else + /* borrowing the upper limit from EXT2_LINK_MAX */ + const unsigned int ramfs_link_max = 32000; + + if (!au_test_ramfs(inode->i_sb) + || inode->i_nlink < ramfs_link_max) + return 0; + return -EMLINK; +#endif +} + +int vfsub_link(struct dentry *src_dentry, struct inode *dir, struct path *path) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + err = au_test_nlink(src_dentry->d_inode); + if (unlikely(err)) + return err; + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_link(src_dentry, path, path->dentry); + path->dentry = d; + if (unlikely(err)) + goto out; + + lockdep_off(); + err = vfs_link(src_dentry, dir, path->dentry); + lockdep_on(); + if (!err) { + struct path tmp = *path; + int did; + + /* fuse has different memory inode for the same inumber */ + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = path->dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + tmp.dentry = src_dentry; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +int vfsub_rename(struct inode *src_dir, struct dentry *src_dentry, + struct inode *dir, struct path *path) +{ + int err; + struct path tmp = { + .mnt = path->mnt + }; + struct dentry *d; + + IMustLock(dir); + IMustLock(src_dir); + + d = path->dentry; + path->dentry = d->d_parent; + tmp.dentry = src_dentry->d_parent; + err = security_path_rename(&tmp, src_dentry, path, path->dentry); + path->dentry = d; + if (unlikely(err)) + goto out; + + lockdep_off(); + err = vfs_rename(src_dir, src_dentry, dir, path->dentry); + lockdep_on(); + if (!err) { + int did; + + tmp.dentry = d->d_parent; + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = src_dentry; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + tmp.dentry = src_dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +int vfsub_mkdir(struct inode *dir, struct path *path, int mode) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_mkdir(path, path->dentry, mode); + path->dentry = d; + if (unlikely(err)) + goto out; + + err = vfs_mkdir(dir, path->dentry, mode); + if (!err) { + struct path tmp = *path; + int did; + + vfsub_update_h_iattr(&tmp, &did); + if (did) { + tmp.dentry = path->dentry->d_parent; + vfsub_update_h_iattr(&tmp, /*did*/NULL); + } + /*ignore*/ + } + + out: + return err; +} + +int vfsub_rmdir(struct inode *dir, struct path *path) +{ + int err; + struct dentry *d; + + IMustLock(dir); + + d = path->dentry; + path->dentry = d->d_parent; + err = security_path_rmdir(path, path->dentry); + path->dentry = d; + if (unlikely(err)) + goto out; + + lockdep_off(); + err = vfs_rmdir(dir, path->dentry); + lockdep_on(); + if (!err) { + struct path tmp = { + .dentry = path->dentry->d_parent, + .mnt = path->mnt + }; + + vfsub_update_h_iattr(&tmp, /*did*/NULL); /*ignore*/ + } + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +ssize_t vfsub_read_u(struct file *file, char __user *ubuf, size_t count, + loff_t *ppos) +{ + ssize_t err; + + err = vfs_read(file, ubuf, count, ppos); + if (err >= 0) + vfsub_update_h_iattr(&file->f_path, /*did*/NULL); /*ignore*/ + return err; +} + +/* todo: kernel_read()? */ +ssize_t vfsub_read_k(struct file *file, void *kbuf, size_t count, + loff_t *ppos) +{ + ssize_t err; + mm_segment_t oldfs; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + err = vfsub_read_u(file, (char __user *)kbuf, count, ppos); + set_fs(oldfs); + return err; +} + +ssize_t vfsub_write_u(struct file *file, const char __user *ubuf, size_t count, + loff_t *ppos) +{ + ssize_t err; + + lockdep_off(); + err = vfs_write(file, ubuf, count, ppos); + lockdep_on(); + if (err >= 0) + vfsub_update_h_iattr(&file->f_path, /*did*/NULL); /*ignore*/ + return err; +} + +ssize_t vfsub_write_k(struct file *file, void *kbuf, size_t count, loff_t *ppos) +{ + ssize_t err; + mm_segment_t oldfs; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + err = vfsub_write_u(file, (const char __user *)kbuf, count, ppos); + set_fs(oldfs); + return err; +} + +int vfsub_readdir(struct file *file, filldir_t filldir, void *arg) +{ + int err; + + lockdep_off(); + err = vfs_readdir(file, filldir, arg); + lockdep_on(); + if (err >= 0) + vfsub_update_h_iattr(&file->f_path, /*did*/NULL); /*ignore*/ + return err; +} + +long vfsub_splice_to(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + long err; + + lockdep_off(); + err = vfs_splice_to(in, ppos, pipe, len, flags); + lockdep_on(); + if (err >= 0) + vfsub_update_h_iattr(&in->f_path, /*did*/NULL); /*ignore*/ + return err; +} + +long vfsub_splice_from(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + long err; + + lockdep_off(); + err = vfs_splice_from(pipe, out, ppos, len, flags); + lockdep_on(); + if (err >= 0) + vfsub_update_h_iattr(&out->f_path, /*did*/NULL); /*ignore*/ + return err; +} + +/* cf. open.c:do_sys_truncate() and do_sys_ftruncate() */ +int vfsub_trunc(struct path *h_path, loff_t length, unsigned int attr, + struct file *h_file) +{ + int err; + struct inode *h_inode; + + h_inode = h_path->dentry->d_inode; + if (!h_file) { + err = mnt_want_write(h_path->mnt); + if (err) + goto out; + err = inode_permission(h_inode, MAY_WRITE); + if (err) + goto out_mnt; + err = get_write_access(h_inode); + if (err) + goto out_mnt; + err = break_lease(h_inode, vfsub_fmode_to_uint(FMODE_WRITE)); + if (err) + goto out_inode; + } + + err = locks_verify_truncate(h_inode, h_file, length); + if (!err) + err = security_path_truncate(h_path, length, attr); + if (!err) { + lockdep_off(); + err = do_truncate(h_path->dentry, length, attr, h_file); + lockdep_on(); + } + + out_inode: + if (!h_file) + put_write_access(h_inode); + out_mnt: + if (!h_file) + mnt_drop_write(h_path->mnt); + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct au_vfsub_mkdir_args { + int *errp; + struct inode *dir; + struct path *path; + int mode; +}; + +static void au_call_vfsub_mkdir(void *args) +{ + struct au_vfsub_mkdir_args *a = args; + *a->errp = vfsub_mkdir(a->dir, a->path, a->mode); +} + +int vfsub_sio_mkdir(struct inode *dir, struct path *path, int mode) +{ + int err, do_sio, wkq_err; + + do_sio = au_test_h_perm_sio(dir, MAY_EXEC | MAY_WRITE); + if (!do_sio) + err = vfsub_mkdir(dir, path, mode); + else { + struct au_vfsub_mkdir_args args = { + .errp = &err, + .dir = dir, + .path = path, + .mode = mode + }; + wkq_err = au_wkq_wait(au_call_vfsub_mkdir, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + return err; +} + +struct au_vfsub_rmdir_args { + int *errp; + struct inode *dir; + struct path *path; +}; + +static void au_call_vfsub_rmdir(void *args) +{ + struct au_vfsub_rmdir_args *a = args; + *a->errp = vfsub_rmdir(a->dir, a->path); +} + +int vfsub_sio_rmdir(struct inode *dir, struct path *path) +{ + int err, do_sio, wkq_err; + + do_sio = au_test_h_perm_sio(dir, MAY_EXEC | MAY_WRITE); + if (!do_sio) + err = vfsub_rmdir(dir, path); + else { + struct au_vfsub_rmdir_args args = { + .errp = &err, + .dir = dir, + .path = path + }; + wkq_err = au_wkq_wait(au_call_vfsub_rmdir, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct notify_change_args { + int *errp; + struct path *path; + struct iattr *ia; +}; + +static void call_notify_change(void *args) +{ + struct notify_change_args *a = args; + struct inode *h_inode; + + h_inode = a->path->dentry->d_inode; + IMustLock(h_inode); + + *a->errp = -EPERM; + if (!IS_IMMUTABLE(h_inode) && !IS_APPEND(h_inode)) { + lockdep_off(); + *a->errp = notify_change(a->path->dentry, a->ia); + lockdep_on(); + if (!*a->errp) + vfsub_update_h_iattr(a->path, /*did*/NULL); /*ignore*/ + } + AuTraceErr(*a->errp); +} + +int vfsub_notify_change(struct path *path, struct iattr *ia) +{ + int err; + struct notify_change_args args = { + .errp = &err, + .path = path, + .ia = ia + }; + + call_notify_change(&args); + + return err; +} + +int vfsub_sio_notify_change(struct path *path, struct iattr *ia) +{ + int err, wkq_err; + struct notify_change_args args = { + .errp = &err, + .path = path, + .ia = ia + }; + + wkq_err = au_wkq_wait(call_notify_change, &args); + if (unlikely(wkq_err)) + err = wkq_err; + + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct unlink_args { + int *errp; + struct inode *dir; + struct path *path; +}; + +static void call_unlink(void *args) +{ + struct unlink_args *a = args; + struct dentry *d = a->path->dentry; + struct inode *h_inode; + const int stop_sillyrename = (au_test_nfs(d->d_sb) + && atomic_read(&d->d_count) == 1); + + IMustLock(a->dir); + + a->path->dentry = d->d_parent; + *a->errp = security_path_unlink(a->path, d); + a->path->dentry = d; + if (unlikely(*a->errp)) + return; + + if (!stop_sillyrename) + dget(d); + h_inode = d->d_inode; + if (h_inode) + atomic_inc(&h_inode->i_count); + + lockdep_off(); + *a->errp = vfs_unlink(a->dir, d); + lockdep_on(); + if (!*a->errp) { + struct path tmp = { + .dentry = d->d_parent, + .mnt = a->path->mnt + }; + vfsub_update_h_iattr(&tmp, /*did*/NULL); /*ignore*/ + } + + if (!stop_sillyrename) + dput(d); + if (h_inode) + iput(h_inode); + + AuTraceErr(*a->errp); +} + +/* + * @dir: must be locked. + * @dentry: target dentry. + */ +int vfsub_unlink(struct inode *dir, struct path *path, int force) +{ + int err; + struct unlink_args args = { + .errp = &err, + .dir = dir, + .path = path + }; + + if (!force) + call_unlink(&args); + else { + int wkq_err; + + wkq_err = au_wkq_wait(call_unlink, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + + return err; +} diff -urN ./fs/aufs/vfsub.h ./fs/aufs/vfsub.h --- ./fs/aufs/vfsub.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/vfsub.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,165 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * sub-routines for VFS + */ + +#ifndef __AUFS_VFSUB_H__ +#define __AUFS_VFSUB_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include + +/* ---------------------------------------------------------------------- */ + +/* lock subclass for lower inode */ +/* default MAX_LOCKDEP_SUBCLASSES(8) is not enough */ +/* reduce? gave up. */ +enum { + AuLsc_I_Begin = I_MUTEX_QUOTA, /* 4 */ + AuLsc_I_PARENT, /* lower inode, parent first */ + AuLsc_I_PARENT2, /* copyup dirs */ + AuLsc_I_CHILD, + AuLsc_I_CHILD2, + AuLsc_I_End +}; + +/* to debug easier, do not make them inlined functions */ +#define MtxMustLock(mtx) AuDebugOn(!mutex_is_locked(mtx)) +#define IMustLock(i) MtxMustLock(&(i)->i_mutex) + +/* ---------------------------------------------------------------------- */ + +static inline void vfsub_copy_inode_size(struct inode *inode, + struct inode *h_inode) +{ + spin_lock(&inode->i_lock); + fsstack_copy_inode_size(inode, h_inode); + spin_unlock(&inode->i_lock); +} + +int vfsub_update_h_iattr(struct path *h_path, int *did); +struct file *vfsub_filp_open(const char *path, int oflags, int mode); +int vfsub_kern_path(const char *name, unsigned int flags, struct path *path); +struct dentry *vfsub_lookup_one_len(const char *name, struct dentry *parent, + int len); +struct dentry *vfsub_lookup_hash(struct nameidata *nd); + +/* ---------------------------------------------------------------------- */ + +struct au_hinode; +struct dentry *vfsub_lock_rename(struct dentry *d1, struct au_hinode *hdir1, + struct dentry *d2, struct au_hinode *hdir2); +void vfsub_unlock_rename(struct dentry *d1, struct au_hinode *hdir1, + struct dentry *d2, struct au_hinode *hdir2); + +int vfsub_create(struct inode *dir, struct path *path, int mode); +int vfsub_symlink(struct inode *dir, struct path *path, + const char *symname); +int vfsub_mknod(struct inode *dir, struct path *path, int mode, dev_t dev); +int vfsub_link(struct dentry *src_dentry, struct inode *dir, + struct path *path); +int vfsub_rename(struct inode *src_hdir, struct dentry *src_dentry, + struct inode *hdir, struct path *path); +int vfsub_mkdir(struct inode *dir, struct path *path, int mode); +int vfsub_rmdir(struct inode *dir, struct path *path); + +/* ---------------------------------------------------------------------- */ + +ssize_t vfsub_read_u(struct file *file, char __user *ubuf, size_t count, + loff_t *ppos); +ssize_t vfsub_read_k(struct file *file, void *kbuf, size_t count, + loff_t *ppos); +ssize_t vfsub_write_u(struct file *file, const char __user *ubuf, size_t count, + loff_t *ppos); +ssize_t vfsub_write_k(struct file *file, void *kbuf, size_t count, + loff_t *ppos); +int vfsub_readdir(struct file *file, filldir_t filldir, void *arg); + +static inline void vfsub_file_accessed(struct file *h_file) +{ + file_accessed(h_file); + vfsub_update_h_iattr(&h_file->f_path, /*did*/NULL); /*ignore*/ +} + +static inline void vfsub_touch_atime(struct vfsmount *h_mnt, + struct dentry *h_dentry) +{ + struct path h_path = { + .dentry = h_dentry, + .mnt = h_mnt + }; + touch_atime(h_mnt, h_dentry); + vfsub_update_h_iattr(&h_path, /*did*/NULL); /*ignore*/ +} + +long vfsub_splice_to(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); +long vfsub_splice_from(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags); +int vfsub_trunc(struct path *h_path, loff_t length, unsigned int attr, + struct file *h_file); + +/* ---------------------------------------------------------------------- */ + +static inline loff_t vfsub_llseek(struct file *file, loff_t offset, int origin) +{ + loff_t err; + + lockdep_off(); + err = vfs_llseek(file, offset, origin); + lockdep_on(); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* dirty workaround for strict type of fmode_t */ +union vfsub_fmu { + fmode_t fm; + unsigned int ui; +}; + +static inline unsigned int vfsub_fmode_to_uint(fmode_t fm) +{ + union vfsub_fmu u = { + .fm = fm + }; + + BUILD_BUG_ON(sizeof(u.fm) != sizeof(u.ui)); + + return u.ui; +} + +static inline fmode_t vfsub_uint_to_fmode(unsigned int ui) +{ + union vfsub_fmu u = { + .ui = ui + }; + + return u.fm; +} + +/* ---------------------------------------------------------------------- */ + +int vfsub_sio_mkdir(struct inode *dir, struct path *path, int mode); +int vfsub_sio_rmdir(struct inode *dir, struct path *path); +int vfsub_sio_notify_change(struct path *path, struct iattr *ia); +int vfsub_notify_change(struct path *path, struct iattr *ia); +int vfsub_unlink(struct inode *dir, struct path *path, int force); + +#endif /* __KERNEL__ */ +#endif /* __AUFS_VFSUB_H__ */ diff -urN ./fs/aufs/wbr_policy.c ./fs/aufs/wbr_policy.c --- ./fs/aufs/wbr_policy.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/wbr_policy.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,628 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * policies for selecting one among multiple writable branches + */ + +#include +#include "aufs.h" + +/* subset of cpup_attr() */ +static noinline_for_stack +int au_cpdown_attr(struct path *h_path, struct dentry *h_src) +{ + int err, sbits; + struct iattr ia; + struct inode *h_isrc; + + h_isrc = h_src->d_inode; + ia.ia_valid = ATTR_FORCE | ATTR_MODE | ATTR_UID | ATTR_GID; + ia.ia_mode = h_isrc->i_mode; + ia.ia_uid = h_isrc->i_uid; + ia.ia_gid = h_isrc->i_gid; + sbits = !!(ia.ia_mode & (S_ISUID | S_ISGID)); + au_cpup_attr_flags(h_path->dentry->d_inode, h_isrc); + err = vfsub_sio_notify_change(h_path, &ia); + + /* is this nfs only? */ + if (!err && sbits && au_test_nfs(h_path->dentry->d_sb)) { + ia.ia_valid = ATTR_FORCE | ATTR_MODE; + ia.ia_mode = h_isrc->i_mode; + err = vfsub_sio_notify_change(h_path, &ia); + } + + return err; +} + +#define AuCpdown_PARENT_OPQ 1 +#define AuCpdown_WHED (1 << 1) +#define AuCpdown_MADE_DIR (1 << 2) +#define AuCpdown_DIROPQ (1 << 3) +#define au_ftest_cpdown(flags, name) ((flags) & AuCpdown_##name) +#define au_fset_cpdown(flags, name) { (flags) |= AuCpdown_##name; } +#define au_fclr_cpdown(flags, name) { (flags) &= ~AuCpdown_##name; } + +struct au_cpdown_dir_args { + struct dentry *parent; + unsigned int flags; +}; + +static int au_cpdown_dir_opq(struct dentry *dentry, aufs_bindex_t bdst, + struct au_cpdown_dir_args *a) +{ + int err; + struct dentry *opq_dentry; + + opq_dentry = au_diropq_create(dentry, bdst); + err = PTR_ERR(opq_dentry); + if (IS_ERR(opq_dentry)) + goto out; + dput(opq_dentry); + au_fset_cpdown(a->flags, DIROPQ); + + out: + return err; +} + +static int au_cpdown_dir_wh(struct dentry *dentry, struct dentry *h_parent, + struct inode *dir, aufs_bindex_t bdst) +{ + int err; + struct path h_path; + struct au_branch *br; + + br = au_sbr(dentry->d_sb, bdst); + h_path.dentry = au_wh_lkup(h_parent, &dentry->d_name, br); + err = PTR_ERR(h_path.dentry); + if (IS_ERR(h_path.dentry)) + goto out; + + err = 0; + if (h_path.dentry->d_inode) { + h_path.mnt = br->br_mnt; + err = au_wh_unlink_dentry(au_h_iptr(dir, bdst), &h_path, + dentry); + } + dput(h_path.dentry); + + out: + return err; +} + +static int au_cpdown_dir(struct dentry *dentry, aufs_bindex_t bdst, + struct dentry *h_parent, void *arg) +{ + int err, rerr; + aufs_bindex_t bend, bopq, bstart; + unsigned char parent_opq; + struct path h_path; + struct dentry *parent; + struct inode *h_dir, *h_inode, *inode, *dir; + struct au_cpdown_dir_args *args = arg; + + bstart = au_dbstart(dentry); + /* dentry is di-locked */ + parent = dget_parent(dentry); + dir = parent->d_inode; + h_dir = h_parent->d_inode; + AuDebugOn(h_dir != au_h_iptr(dir, bdst)); + IMustLock(h_dir); + + err = au_lkup_neg(dentry, bdst); + if (unlikely(err < 0)) + goto out; + h_path.dentry = au_h_dptr(dentry, bdst); + h_path.mnt = au_sbr_mnt(dentry->d_sb, bdst); + err = vfsub_sio_mkdir(au_h_iptr(dir, bdst), &h_path, + S_IRWXU | S_IRUGO | S_IXUGO); + if (unlikely(err)) + goto out_put; + au_fset_cpdown(args->flags, MADE_DIR); + + bend = au_dbend(dentry); + bopq = au_dbdiropq(dentry); + au_fclr_cpdown(args->flags, WHED); + au_fclr_cpdown(args->flags, DIROPQ); + if (au_dbwh(dentry) == bdst) + au_fset_cpdown(args->flags, WHED); + if (!au_ftest_cpdown(args->flags, PARENT_OPQ) && bopq <= bdst) + au_fset_cpdown(args->flags, PARENT_OPQ); + parent_opq = (au_ftest_cpdown(args->flags, PARENT_OPQ) + && args->parent == dentry); + h_inode = h_path.dentry->d_inode; + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + if (au_ftest_cpdown(args->flags, WHED)) { + err = au_cpdown_dir_opq(dentry, bdst, args); + if (unlikely(err)) { + mutex_unlock(&h_inode->i_mutex); + goto out_dir; + } + } + + err = au_cpdown_attr(&h_path, au_h_dptr(dentry, bstart)); + mutex_unlock(&h_inode->i_mutex); + if (unlikely(err)) + goto out_opq; + + if (au_ftest_cpdown(args->flags, WHED)) { + err = au_cpdown_dir_wh(dentry, h_parent, dir, bdst); + if (unlikely(err)) + goto out_opq; + } + + inode = dentry->d_inode; + if (au_ibend(inode) < bdst) + au_set_ibend(inode, bdst); + au_set_h_iptr(inode, bdst, au_igrab(h_inode), + au_hi_flags(inode, /*isdir*/1)); + goto out; /* success */ + + /* revert */ + out_opq: + if (au_ftest_cpdown(args->flags, DIROPQ)) { + mutex_lock_nested(&h_inode->i_mutex, AuLsc_I_CHILD); + rerr = au_diropq_remove(dentry, bdst); + mutex_unlock(&h_inode->i_mutex); + if (unlikely(rerr)) { + AuIOErr("failed removing diropq for %.*s b%d (%d)\n", + AuDLNPair(dentry), bdst, rerr); + err = -EIO; + goto out; + } + } + out_dir: + if (au_ftest_cpdown(args->flags, MADE_DIR)) { + rerr = vfsub_sio_rmdir(au_h_iptr(dir, bdst), &h_path); + if (unlikely(rerr)) { + AuIOErr("failed removing %.*s b%d (%d)\n", + AuDLNPair(dentry), bdst, rerr); + err = -EIO; + } + } + out_put: + au_set_h_dptr(dentry, bdst, NULL); + if (au_dbend(dentry) == bdst) + au_update_dbend(dentry); + out: + dput(parent); + return err; +} + +int au_cpdown_dirs(struct dentry *dentry, aufs_bindex_t bdst) +{ + int err; + struct au_cpdown_dir_args args = { + .parent = dget_parent(dentry), + .flags = 0 + }; + + err = au_cp_dirs(dentry, bdst, au_cpdown_dir, &args); + dput(args.parent); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* policies for create */ + +static int au_wbr_bu(struct super_block *sb, aufs_bindex_t bindex) +{ + for (; bindex >= 0; bindex--) + if (!au_br_rdonly(au_sbr(sb, bindex))) + return bindex; + return -EROFS; +} + +/* top down parent */ +static int au_wbr_create_tdp(struct dentry *dentry, int isdir __maybe_unused) +{ + int err; + aufs_bindex_t bstart, bindex; + struct super_block *sb; + struct dentry *parent, *h_parent; + + sb = dentry->d_sb; + bstart = au_dbstart(dentry); + err = bstart; + if (!au_br_rdonly(au_sbr(sb, bstart))) + goto out; + + err = -EROFS; + parent = dget_parent(dentry); + for (bindex = au_dbstart(parent); bindex < bstart; bindex++) { + h_parent = au_h_dptr(parent, bindex); + if (!h_parent || !h_parent->d_inode) + continue; + + if (!au_br_rdonly(au_sbr(sb, bindex))) { + err = bindex; + break; + } + } + dput(parent); + + /* bottom up here */ + if (unlikely(err < 0)) + err = au_wbr_bu(sb, bstart - 1); + + out: + AuDbg("b%d\n", err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* an exception for the policy other than tdp */ +static int au_wbr_create_exp(struct dentry *dentry) +{ + int err; + aufs_bindex_t bwh, bdiropq; + struct dentry *parent; + + err = -1; + bwh = au_dbwh(dentry); + parent = dget_parent(dentry); + bdiropq = au_dbdiropq(parent); + if (bwh >= 0) { + if (bdiropq >= 0) + err = min(bdiropq, bwh); + else + err = bwh; + AuDbg("%d\n", err); + } else if (bdiropq >= 0) { + err = bdiropq; + AuDbg("%d\n", err); + } + dput(parent); + + if (err >= 0 && au_br_rdonly(au_sbr(dentry->d_sb, err))) + err = -1; + + AuDbg("%d\n", err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* round robin */ +static int au_wbr_create_init_rr(struct super_block *sb) +{ + int err; + + err = au_wbr_bu(sb, au_sbend(sb)); + atomic_set(&au_sbi(sb)->si_wbr_rr_next, -err); /* less important */ + + AuDbg("b%d\n", err); + return err; +} + +static int au_wbr_create_rr(struct dentry *dentry, int isdir) +{ + int err, nbr; + unsigned int u; + aufs_bindex_t bindex, bend; + struct super_block *sb; + atomic_t *next; + + err = au_wbr_create_exp(dentry); + if (err >= 0) + goto out; + + sb = dentry->d_sb; + next = &au_sbi(sb)->si_wbr_rr_next; + bend = au_sbend(sb); + nbr = bend + 1; + for (bindex = 0; bindex <= bend; bindex++) { + if (!isdir) { + err = atomic_dec_return(next) + 1; + /* modulo for 0 is meaningless */ + if (unlikely(!err)) + err = atomic_dec_return(next) + 1; + } else + err = atomic_read(next); + AuDbg("%d\n", err); + u = err; + err = u % nbr; + AuDbg("%d\n", err); + if (!au_br_rdonly(au_sbr(sb, err))) + break; + err = -EROFS; + } + + out: + AuDbg("%d\n", err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* most free space */ +static void au_mfs(struct dentry *dentry) +{ + struct super_block *sb; + struct au_branch *br; + struct au_wbr_mfs *mfs; + aufs_bindex_t bindex, bend; + int err; + unsigned long long b, bavail; + /* reduce the stack usage */ + struct kstatfs *st; + + st = kmalloc(sizeof(*st), GFP_NOFS); + if (unlikely(!st)) { + AuWarn1("failed updating mfs(%d), ignored\n", -ENOMEM); + return; + } + + bavail = 0; + sb = dentry->d_sb; + mfs = &au_sbi(sb)->si_wbr_mfs; + mfs->mfs_bindex = -EROFS; + mfs->mfsrr_bytes = 0; + bend = au_sbend(sb); + for (bindex = 0; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + if (au_br_rdonly(br)) + continue; + + /* sb->s_root for NFS is unreliable */ + err = vfs_statfs(br->br_mnt->mnt_root, st); + if (unlikely(err)) { + AuWarn1("failed statfs, b%d, %d\n", bindex, err); + continue; + } + + /* when the available size is equal, select the lower one */ + BUILD_BUG_ON(sizeof(b) < sizeof(st->f_bavail) + || sizeof(b) < sizeof(st->f_bsize)); + b = st->f_bavail * st->f_bsize; + br->br_wbr->wbr_bytes = b; + if (b >= bavail) { + bavail = b; + mfs->mfs_bindex = bindex; + mfs->mfs_jiffy = jiffies; + } + } + + mfs->mfsrr_bytes = bavail; + AuDbg("b%d\n", mfs->mfs_bindex); + kfree(st); +} + +static int au_wbr_create_mfs(struct dentry *dentry, int isdir __maybe_unused) +{ + int err; + struct super_block *sb; + struct au_wbr_mfs *mfs; + + err = au_wbr_create_exp(dentry); + if (err >= 0) + goto out; + + sb = dentry->d_sb; + mfs = &au_sbi(sb)->si_wbr_mfs; + mutex_lock(&mfs->mfs_lock); + if (time_after(jiffies, mfs->mfs_jiffy + mfs->mfs_expire) + || mfs->mfs_bindex < 0 + || au_br_rdonly(au_sbr(sb, mfs->mfs_bindex))) + au_mfs(dentry); + mutex_unlock(&mfs->mfs_lock); + err = mfs->mfs_bindex; + + out: + AuDbg("b%d\n", err); + return err; +} + +static int au_wbr_create_init_mfs(struct super_block *sb) +{ + struct au_wbr_mfs *mfs; + + mfs = &au_sbi(sb)->si_wbr_mfs; + mutex_init(&mfs->mfs_lock); + mfs->mfs_jiffy = 0; + mfs->mfs_bindex = -EROFS; + + return 0; +} + +static int au_wbr_create_fin_mfs(struct super_block *sb __maybe_unused) +{ + mutex_destroy(&au_sbi(sb)->si_wbr_mfs.mfs_lock); + return 0; +} + +/* ---------------------------------------------------------------------- */ + +/* most free space and then round robin */ +static int au_wbr_create_mfsrr(struct dentry *dentry, int isdir) +{ + int err; + struct au_wbr_mfs *mfs; + + err = au_wbr_create_mfs(dentry, isdir); + if (err >= 0) { + mfs = &au_sbi(dentry->d_sb)->si_wbr_mfs; + if (mfs->mfsrr_bytes < mfs->mfsrr_watermark) + err = au_wbr_create_rr(dentry, isdir); + } + + AuDbg("b%d\n", err); + return err; +} + +static int au_wbr_create_init_mfsrr(struct super_block *sb) +{ + int err; + + au_wbr_create_init_mfs(sb); /* ignore */ + err = au_wbr_create_init_rr(sb); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* top down parent and most free space */ +static int au_wbr_create_pmfs(struct dentry *dentry, int isdir) +{ + int err, e2; + unsigned long long b; + aufs_bindex_t bindex, bstart, bend; + struct super_block *sb; + struct dentry *parent, *h_parent; + struct au_branch *br; + + err = au_wbr_create_tdp(dentry, isdir); + if (unlikely(err < 0)) + goto out; + parent = dget_parent(dentry); + bstart = au_dbstart(parent); + bend = au_dbtaildir(parent); + if (bstart == bend) + goto out_parent; /* success */ + + e2 = au_wbr_create_mfs(dentry, isdir); + if (e2 < 0) + goto out_parent; /* success */ + + /* when the available size is equal, select upper one */ + sb = dentry->d_sb; + br = au_sbr(sb, err); + b = br->br_wbr->wbr_bytes; + AuDbg("b%d, %llu\n", err, b); + + for (bindex = bstart; bindex <= bend; bindex++) { + h_parent = au_h_dptr(parent, bindex); + if (!h_parent || !h_parent->d_inode) + continue; + + br = au_sbr(sb, bindex); + if (!au_br_rdonly(br) && br->br_wbr->wbr_bytes > b) { + b = br->br_wbr->wbr_bytes; + err = bindex; + AuDbg("b%d, %llu\n", err, b); + } + } + + out_parent: + dput(parent); + out: + AuDbg("b%d\n", err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* policies for copyup */ + +/* top down parent */ +static int au_wbr_copyup_tdp(struct dentry *dentry) +{ + return au_wbr_create_tdp(dentry, /*isdir, anything is ok*/0); +} + +/* bottom up parent */ +static int au_wbr_copyup_bup(struct dentry *dentry) +{ + int err; + aufs_bindex_t bindex, bstart; + struct dentry *parent, *h_parent; + struct super_block *sb; + + err = -EROFS; + sb = dentry->d_sb; + parent = dget_parent(dentry); + bstart = au_dbstart(parent); + for (bindex = au_dbstart(dentry); bindex >= bstart; bindex--) { + h_parent = au_h_dptr(parent, bindex); + if (!h_parent || !h_parent->d_inode) + continue; + + if (!au_br_rdonly(au_sbr(sb, bindex))) { + err = bindex; + break; + } + } + dput(parent); + + /* bottom up here */ + if (unlikely(err < 0)) + err = au_wbr_bu(sb, bstart - 1); + + AuDbg("b%d\n", err); + return err; +} + +/* bottom up */ +static int au_wbr_copyup_bu(struct dentry *dentry) +{ + int err; + + err = au_wbr_bu(dentry->d_sb, au_dbstart(dentry)); + + AuDbg("b%d\n", err); + return err; +} + +/* ---------------------------------------------------------------------- */ + +struct au_wbr_copyup_operations au_wbr_copyup_ops[] = { + [AuWbrCopyup_TDP] = { + .copyup = au_wbr_copyup_tdp + }, + [AuWbrCopyup_BUP] = { + .copyup = au_wbr_copyup_bup + }, + [AuWbrCopyup_BU] = { + .copyup = au_wbr_copyup_bu + } +}; + +struct au_wbr_create_operations au_wbr_create_ops[] = { + [AuWbrCreate_TDP] = { + .create = au_wbr_create_tdp + }, + [AuWbrCreate_RR] = { + .create = au_wbr_create_rr, + .init = au_wbr_create_init_rr + }, + [AuWbrCreate_MFS] = { + .create = au_wbr_create_mfs, + .init = au_wbr_create_init_mfs, + .fin = au_wbr_create_fin_mfs + }, + [AuWbrCreate_MFSV] = { + .create = au_wbr_create_mfs, + .init = au_wbr_create_init_mfs, + .fin = au_wbr_create_fin_mfs + }, + [AuWbrCreate_MFSRR] = { + .create = au_wbr_create_mfsrr, + .init = au_wbr_create_init_mfsrr, + .fin = au_wbr_create_fin_mfs + }, + [AuWbrCreate_MFSRRV] = { + .create = au_wbr_create_mfsrr, + .init = au_wbr_create_init_mfsrr, + .fin = au_wbr_create_fin_mfs + }, + [AuWbrCreate_PMFS] = { + .create = au_wbr_create_pmfs, + .init = au_wbr_create_init_mfs, + .fin = au_wbr_create_fin_mfs + }, + [AuWbrCreate_PMFSV] = { + .create = au_wbr_create_pmfs, + .init = au_wbr_create_init_mfs, + .fin = au_wbr_create_fin_mfs + } +}; diff -urN ./fs/aufs/whout.c ./fs/aufs/whout.c --- ./fs/aufs/whout.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/whout.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,1010 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * whiteout for logical deletion and opaque directory + */ + +#include +#include "aufs.h" + +#define WH_MASK S_IRUGO + +/* + * If a directory contains this file, then it is opaque. We start with the + * .wh. flag so that it is blocked by lookup. + */ +static struct qstr diropq_name = { + .name = AUFS_WH_DIROPQ, + .len = sizeof(AUFS_WH_DIROPQ) - 1 +}; + +/* + * generate whiteout name, which is NOT terminated by NULL. + * @name: original d_name.name + * @len: original d_name.len + * @wh: whiteout qstr + * returns zero when succeeds, otherwise error. + * succeeded value as wh->name should be freed by kfree(). + */ +int au_wh_name_alloc(struct qstr *wh, const struct qstr *name) +{ + char *p; + + if (unlikely(name->len > PATH_MAX - AUFS_WH_PFX_LEN)) + return -ENAMETOOLONG; + + wh->len = name->len + AUFS_WH_PFX_LEN; + p = kmalloc(wh->len, GFP_NOFS); + wh->name = p; + if (p) { + memcpy(p, AUFS_WH_PFX, AUFS_WH_PFX_LEN); + memcpy(p + AUFS_WH_PFX_LEN, name->name, name->len); + /* smp_mb(); */ + return 0; + } + return -ENOMEM; +} + +/* ---------------------------------------------------------------------- */ + +/* + * test if the @wh_name exists under @h_parent. + * @try_sio specifies the necessary of super-io. + */ +int au_wh_test(struct dentry *h_parent, struct qstr *wh_name, + struct au_branch *br, int try_sio) +{ + int err; + struct dentry *wh_dentry; + struct inode *h_dir; + + h_dir = h_parent->d_inode; + if (!try_sio) + wh_dentry = au_lkup_one(wh_name, h_parent, br, /*nd*/NULL); + else + wh_dentry = au_sio_lkup_one(wh_name, h_parent, br); + err = PTR_ERR(wh_dentry); + if (IS_ERR(wh_dentry)) + goto out; + + err = 0; + if (!wh_dentry->d_inode) + goto out_wh; /* success */ + + err = 1; + if (S_ISREG(wh_dentry->d_inode->i_mode)) + goto out_wh; /* success */ + + err = -EIO; + AuIOErr("%.*s Invalid whiteout entry type 0%o.\n", + AuDLNPair(wh_dentry), wh_dentry->d_inode->i_mode); + + out_wh: + dput(wh_dentry); + out: + return err; +} + +/* + * test if the @h_dentry sets opaque or not. + */ +int au_diropq_test(struct dentry *h_dentry, struct au_branch *br) +{ + int err; + struct inode *h_dir; + + h_dir = h_dentry->d_inode; + err = au_wh_test(h_dentry, &diropq_name, br, + au_test_h_perm_sio(h_dir, MAY_EXEC)); + return err; +} + +/* + * returns a negative dentry whose name is unique and temporary. + */ +struct dentry *au_whtmp_lkup(struct dentry *h_parent, struct au_branch *br, + struct qstr *prefix) +{ +#define HEX_LEN 4 + struct dentry *dentry; + int i; + char defname[AUFS_WH_PFX_LEN * 2 + DNAME_INLINE_LEN_MIN + 1 + + HEX_LEN + 1], *name, *p; + static unsigned short cnt; + struct qstr qs; + + name = defname; + qs.len = sizeof(defname) - DNAME_INLINE_LEN_MIN + prefix->len - 1; + if (unlikely(prefix->len > DNAME_INLINE_LEN_MIN)) { + dentry = ERR_PTR(-ENAMETOOLONG); + if (unlikely(qs.len >= PATH_MAX)) + goto out; + dentry = ERR_PTR(-ENOMEM); + name = kmalloc(qs.len + 1, GFP_NOFS); + if (unlikely(!name)) + goto out; + } + + /* doubly whiteout-ed */ + memcpy(name, AUFS_WH_PFX AUFS_WH_PFX, AUFS_WH_PFX_LEN * 2); + p = name + AUFS_WH_PFX_LEN * 2; + memcpy(p, prefix->name, prefix->len); + p += prefix->len; + *p++ = '.'; + AuDebugOn(name + qs.len + 1 - p <= HEX_LEN); + + qs.name = name; + for (i = 0; i < 3; i++) { + sprintf(p, "%.*d", HEX_LEN, cnt++); + dentry = au_sio_lkup_one(&qs, h_parent, br); + if (IS_ERR(dentry) || !dentry->d_inode) + goto out_name; + dput(dentry); + } + /* AuWarn("could not get random name\n"); */ + dentry = ERR_PTR(-EEXIST); + AuDbg("%.*s\n", AuLNPair(&qs)); + BUG(); + + out_name: + if (name != defname) + kfree(name); + out: + return dentry; +#undef HEX_LEN +} + +/* + * rename the @h_dentry on @br to the whiteouted temporary name. + */ +int au_whtmp_ren(struct dentry *h_dentry, struct au_branch *br) +{ + int err; + struct path h_path = { + .mnt = br->br_mnt + }; + struct inode *h_dir; + struct dentry *h_parent; + + h_parent = h_dentry->d_parent; /* dir inode is locked */ + h_dir = h_parent->d_inode; + IMustLock(h_dir); + + h_path.dentry = au_whtmp_lkup(h_parent, br, &h_dentry->d_name); + err = PTR_ERR(h_path.dentry); + if (IS_ERR(h_path.dentry)) + goto out; + + /* under the same dir, no need to lock_rename() */ + err = vfsub_rename(h_dir, h_dentry, h_dir, &h_path); + AuTraceErr(err); + dput(h_path.dentry); + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ +/* + * functions for removing a whiteout + */ + +static int do_unlink_wh(struct inode *h_dir, struct path *h_path) +{ + int force; + + /* + * forces superio when the dir has a sticky bit. + * this may be a violation of unix fs semantics. + */ + force = (h_dir->i_mode & S_ISVTX) + && h_path->dentry->d_inode->i_uid != current_fsuid(); + return vfsub_unlink(h_dir, h_path, force); +} + +int au_wh_unlink_dentry(struct inode *h_dir, struct path *h_path, + struct dentry *dentry) +{ + int err; + + err = do_unlink_wh(h_dir, h_path); + if (!err && dentry) + au_set_dbwh(dentry, -1); + + return err; +} + +static int unlink_wh_name(struct dentry *h_parent, struct qstr *wh, + struct au_branch *br) +{ + int err; + struct path h_path = { + .mnt = br->br_mnt + }; + + err = 0; + h_path.dentry = au_lkup_one(wh, h_parent, br, /*nd*/NULL); + if (IS_ERR(h_path.dentry)) + err = PTR_ERR(h_path.dentry); + else { + if (h_path.dentry->d_inode + && S_ISREG(h_path.dentry->d_inode->i_mode)) + err = do_unlink_wh(h_parent->d_inode, &h_path); + dput(h_path.dentry); + } + + return err; +} + +/* ---------------------------------------------------------------------- */ +/* + * initialize/clean whiteout for a branch + */ + +static void au_wh_clean(struct inode *h_dir, struct path *whpath, + const int isdir) +{ + int err; + + if (!whpath->dentry->d_inode) + return; + + err = mnt_want_write(whpath->mnt); + if (!err) { + if (isdir) + err = vfsub_rmdir(h_dir, whpath); + else + err = vfsub_unlink(h_dir, whpath, /*force*/0); + mnt_drop_write(whpath->mnt); + } + if (unlikely(err)) + AuWarn("failed removing %.*s (%d), ignored.\n", + AuDLNPair(whpath->dentry), err); +} + +static int test_linkable(struct dentry *h_root) +{ + struct inode *h_dir = h_root->d_inode; + + if (h_dir->i_op->link) + return 0; + + AuErr("%.*s (%s) doesn't support link(2), use noplink and rw+nolwh\n", + AuDLNPair(h_root), au_sbtype(h_root->d_sb)); + return -ENOSYS; +} + +/* todo: should this mkdir be done in /sbin/mount.aufs helper? */ +static int au_whdir(struct inode *h_dir, struct path *path) +{ + int err; + + err = -EEXIST; + if (!path->dentry->d_inode) { + int mode = S_IRWXU; + + if (au_test_nfs(path->dentry->d_sb)) + mode |= S_IXUGO; + err = mnt_want_write(path->mnt); + if (!err) { + err = vfsub_mkdir(h_dir, path, mode); + mnt_drop_write(path->mnt); + } + } else if (S_ISDIR(path->dentry->d_inode->i_mode)) + err = 0; + else + AuErr("unknown %.*s exists\n", AuDLNPair(path->dentry)); + + return err; +} + +struct au_wh_base { + const struct qstr *name; + struct dentry *dentry; +}; + +static void au_wh_init_ro(struct inode *h_dir, struct au_wh_base base[], + struct path *h_path) +{ + h_path->dentry = base[AuBrWh_BASE].dentry; + au_wh_clean(h_dir, h_path, /*isdir*/0); + h_path->dentry = base[AuBrWh_PLINK].dentry; + au_wh_clean(h_dir, h_path, /*isdir*/1); + h_path->dentry = base[AuBrWh_ORPH].dentry; + au_wh_clean(h_dir, h_path, /*isdir*/1); +} + +/* + * returns tri-state, + * minus: error, caller should print the mesage + * zero: succuess + * plus: error, caller should NOT print the mesage + */ +static int au_wh_init_rw_nolink(struct dentry *h_root, struct au_wbr *wbr, + int do_plink, struct au_wh_base base[], + struct path *h_path) +{ + int err; + struct inode *h_dir; + + h_dir = h_root->d_inode; + h_path->dentry = base[AuBrWh_BASE].dentry; + au_wh_clean(h_dir, h_path, /*isdir*/0); + h_path->dentry = base[AuBrWh_PLINK].dentry; + if (do_plink) { + err = test_linkable(h_root); + if (unlikely(err)) { + err = 1; + goto out; + } + + err = au_whdir(h_dir, h_path); + if (unlikely(err)) + goto out; + wbr->wbr_plink = dget(base[AuBrWh_PLINK].dentry); + } else + au_wh_clean(h_dir, h_path, /*isdir*/1); + h_path->dentry = base[AuBrWh_ORPH].dentry; + err = au_whdir(h_dir, h_path); + if (unlikely(err)) + goto out; + wbr->wbr_orph = dget(base[AuBrWh_ORPH].dentry); + + out: + return err; +} + +/* + * for the moment, aufs supports the branch filesystem which does not support + * link(2). testing on FAT which does not support i_op->setattr() fully either, + * copyup failed. finally, such filesystem will not be used as the writable + * branch. + * + * returns tri-state, see above. + */ +static int au_wh_init_rw(struct dentry *h_root, struct au_wbr *wbr, + int do_plink, struct au_wh_base base[], + struct path *h_path) +{ + int err; + struct inode *h_dir; + + err = test_linkable(h_root); + if (unlikely(err)) { + err = 1; + goto out; + } + + /* + * todo: should this create be done in /sbin/mount.aufs helper? + */ + err = -EEXIST; + h_dir = h_root->d_inode; + if (!base[AuBrWh_BASE].dentry->d_inode) { + err = mnt_want_write(h_path->mnt); + if (!err) { + h_path->dentry = base[AuBrWh_BASE].dentry; + err = vfsub_create(h_dir, h_path, WH_MASK); + mnt_drop_write(h_path->mnt); + } + } else if (S_ISREG(base[AuBrWh_BASE].dentry->d_inode->i_mode)) + err = 0; + else + AuErr("unknown %.*s/%.*s exists\n", + AuDLNPair(h_root), AuDLNPair(base[AuBrWh_BASE].dentry)); + if (unlikely(err)) + goto out; + + h_path->dentry = base[AuBrWh_PLINK].dentry; + if (do_plink) { + err = au_whdir(h_dir, h_path); + if (unlikely(err)) + goto out; + wbr->wbr_plink = dget(base[AuBrWh_PLINK].dentry); + } else + au_wh_clean(h_dir, h_path, /*isdir*/1); + wbr->wbr_whbase = dget(base[AuBrWh_BASE].dentry); + + h_path->dentry = base[AuBrWh_ORPH].dentry; + err = au_whdir(h_dir, h_path); + if (unlikely(err)) + goto out; + wbr->wbr_orph = dget(base[AuBrWh_ORPH].dentry); + + out: + return err; +} + +/* + * initialize the whiteout base file/dir for @br. + */ +int au_wh_init(struct dentry *h_root, struct au_branch *br, + struct super_block *sb) +{ + int err, i; + const unsigned char do_plink + = !!au_opt_test(au_mntflags(sb), PLINK); + struct path path = { + .mnt = br->br_mnt + }; + struct inode *h_dir; + struct au_wbr *wbr = br->br_wbr; + static const struct qstr base_name[] = { + [AuBrWh_BASE] = { + .name = AUFS_BASE_NAME, + .len = sizeof(AUFS_BASE_NAME) - 1 + }, + [AuBrWh_PLINK] = { + .name = AUFS_PLINKDIR_NAME, + .len = sizeof(AUFS_PLINKDIR_NAME) - 1 + }, + [AuBrWh_ORPH] = { + .name = AUFS_ORPHDIR_NAME, + .len = sizeof(AUFS_ORPHDIR_NAME) - 1 + } + }; + struct au_wh_base base[] = { + [AuBrWh_BASE] = { + .name = base_name + AuBrWh_BASE, + .dentry = NULL + }, + [AuBrWh_PLINK] = { + .name = base_name + AuBrWh_PLINK, + .dentry = NULL + }, + [AuBrWh_ORPH] = { + .name = base_name + AuBrWh_ORPH, + .dentry = NULL + } + }; + + + h_dir = h_root->d_inode; + for (i = 0; i < AuBrWh_Last; i++) { + /* doubly whiteouted */ + struct dentry *d; + + d = au_wh_lkup(h_root, (void *)base[i].name, br); + err = PTR_ERR(d); + if (IS_ERR(d)) + goto out; + + base[i].dentry = d; + AuDebugOn(wbr + && wbr->wbr_wh[i] + && wbr->wbr_wh[i] != base[i].dentry); + } + + if (wbr) + for (i = 0; i < AuBrWh_Last; i++) { + dput(wbr->wbr_wh[i]); + wbr->wbr_wh[i] = NULL; + } + + err = 0; + + switch (br->br_perm) { + case AuBrPerm_RO: + case AuBrPerm_ROWH: + case AuBrPerm_RR: + case AuBrPerm_RRWH: + au_wh_init_ro(h_dir, base, &path); + break; + + case AuBrPerm_RWNoLinkWH: + err = au_wh_init_rw_nolink(h_root, wbr, do_plink, base, &path); + if (err > 0) + goto out; + else if (err) + goto out_err; + break; + + case AuBrPerm_RW: + err = au_wh_init_rw(h_root, wbr, do_plink, base, &path); + if (err > 0) + goto out; + else if (err) + goto out_err; + break; + + default: + BUG(); + } + goto out; /* success */ + + out_err: + AuErr("an error(%d) on the writable branch %.*s(%s)\n", + err, AuDLNPair(h_root), au_sbtype(h_root->d_sb)); + out: + for (i = 0; i < AuBrWh_Last; i++) + dput(base[i].dentry); + return err; +} + +/* ---------------------------------------------------------------------- */ +/* + * whiteouts are all hard-linked usually. + * when its link count reaches a ceiling, we create a new whiteout base + * asynchronously. + */ + +struct reinit_br_wh { + struct super_block *sb; + struct au_branch *br; +}; + +static void reinit_br_wh(void *arg) +{ + int err; + aufs_bindex_t bindex; + struct path h_path; + struct reinit_br_wh *a = arg; + struct au_wbr *wbr; + struct inode *dir; + struct dentry *h_root; + struct au_hinode *hdir; + + err = 0; + wbr = a->br->br_wbr; + /* big aufs lock */ + si_noflush_write_lock(a->sb); + if (!au_br_writable(a->br->br_perm)) + goto out; + bindex = au_br_index(a->sb, a->br->br_id); + if (unlikely(bindex < 0)) + goto out; + + dir = a->sb->s_root->d_inode; + /* ii_read_lock_parent(dir); */ + hdir = au_hi(dir, bindex); + h_root = au_h_dptr(a->sb->s_root, bindex); + + au_hin_imtx_lock_nested(hdir, AuLsc_I_PARENT); + wbr_wh_write_lock(wbr); + err = au_h_verify(wbr->wbr_whbase, au_opt_udba(a->sb), hdir->hi_inode, + h_root, a->br); + if (!err) { + err = mnt_want_write(a->br->br_mnt); + if (!err) { + h_path.dentry = wbr->wbr_whbase; + h_path.mnt = a->br->br_mnt; + err = vfsub_unlink(hdir->hi_inode, &h_path, /*force*/0); + mnt_drop_write(a->br->br_mnt); + } + } else { + AuWarn("%.*s is moved, ignored\n", AuDLNPair(wbr->wbr_whbase)); + err = 0; + } + dput(wbr->wbr_whbase); + wbr->wbr_whbase = NULL; + if (!err) + err = au_wh_init(h_root, a->br, a->sb); + wbr_wh_write_unlock(wbr); + au_hin_imtx_unlock(hdir); + /* ii_read_unlock(dir); */ + + out: + if (wbr) + atomic_dec(&wbr->wbr_wh_running); + atomic_dec(&a->br->br_count); + au_nwt_done(&au_sbi(a->sb)->si_nowait); + si_write_unlock(a->sb); + kfree(arg); + if (unlikely(err)) + AuIOErr("err %d\n", err); +} + +static void kick_reinit_br_wh(struct super_block *sb, struct au_branch *br) +{ + int do_dec, wkq_err; + struct reinit_br_wh *arg; + + do_dec = 1; + if (atomic_inc_return(&br->br_wbr->wbr_wh_running) != 1) + goto out; + + /* ignore ENOMEM */ + arg = kmalloc(sizeof(*arg), GFP_NOFS); + if (arg) { + /* + * dec(wh_running), kfree(arg) and dec(br_count) + * in reinit function + */ + arg->sb = sb; + arg->br = br; + atomic_inc(&br->br_count); + wkq_err = au_wkq_nowait(reinit_br_wh, arg, sb); + if (unlikely(wkq_err)) { + atomic_dec(&br->br_wbr->wbr_wh_running); + atomic_dec(&br->br_count); + kfree(arg); + } + do_dec = 0; + } + + out: + if (do_dec) + atomic_dec(&br->br_wbr->wbr_wh_running); +} + +/* ---------------------------------------------------------------------- */ + +/* + * create the whiteout @wh. + */ +static int link_or_create_wh(struct super_block *sb, aufs_bindex_t bindex, + struct dentry *wh) +{ + int err; + struct path h_path = { + .dentry = wh + }; + struct au_branch *br; + struct au_wbr *wbr; + struct dentry *h_parent; + struct inode *h_dir; + + h_parent = wh->d_parent; /* dir inode is locked */ + h_dir = h_parent->d_inode; + IMustLock(h_dir); + + br = au_sbr(sb, bindex); + h_path.mnt = br->br_mnt; + wbr = br->br_wbr; + wbr_wh_read_lock(wbr); + if (wbr->wbr_whbase) { + err = vfsub_link(wbr->wbr_whbase, h_dir, &h_path); + if (!err || err != -EMLINK) + goto out; + + /* link count full. re-initialize br_whbase. */ + kick_reinit_br_wh(sb, br); + } + + /* return this error in this context */ + err = vfsub_create(h_dir, &h_path, WH_MASK); + + out: + wbr_wh_read_unlock(wbr); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * create or remove the diropq. + */ +static struct dentry *do_diropq(struct dentry *dentry, aufs_bindex_t bindex, + unsigned int flags) +{ + struct dentry *opq_dentry, *h_dentry; + struct super_block *sb; + struct au_branch *br; + int err; + + sb = dentry->d_sb; + br = au_sbr(sb, bindex); + h_dentry = au_h_dptr(dentry, bindex); + opq_dentry = au_lkup_one(&diropq_name, h_dentry, br, /*nd*/NULL); + if (IS_ERR(opq_dentry)) + goto out; + + if (au_ftest_diropq(flags, CREATE)) { + err = link_or_create_wh(sb, bindex, opq_dentry); + if (!err) { + au_set_dbdiropq(dentry, bindex); + goto out; /* success */ + } + } else { + struct path tmp = { + .dentry = opq_dentry, + .mnt = br->br_mnt + }; + err = do_unlink_wh(au_h_iptr(dentry->d_inode, bindex), &tmp); + if (!err) + au_set_dbdiropq(dentry, -1); + } + dput(opq_dentry); + opq_dentry = ERR_PTR(err); + + out: + return opq_dentry; +} + +struct do_diropq_args { + struct dentry **errp; + struct dentry *dentry; + aufs_bindex_t bindex; + unsigned int flags; +}; + +static void call_do_diropq(void *args) +{ + struct do_diropq_args *a = args; + *a->errp = do_diropq(a->dentry, a->bindex, a->flags); +} + +struct dentry *au_diropq_sio(struct dentry *dentry, aufs_bindex_t bindex, + unsigned int flags) +{ + struct dentry *diropq, *h_dentry; + + h_dentry = au_h_dptr(dentry, bindex); + if (!au_test_h_perm_sio(h_dentry->d_inode, MAY_EXEC | MAY_WRITE)) + diropq = do_diropq(dentry, bindex, flags); + else { + int wkq_err; + struct do_diropq_args args = { + .errp = &diropq, + .dentry = dentry, + .bindex = bindex, + .flags = flags + }; + + wkq_err = au_wkq_wait(call_do_diropq, &args); + if (unlikely(wkq_err)) + diropq = ERR_PTR(wkq_err); + } + + return diropq; +} + +/* ---------------------------------------------------------------------- */ + +/* + * lookup whiteout dentry. + * @h_parent: lower parent dentry which must exist and be locked + * @base_name: name of dentry which will be whiteouted + * returns dentry for whiteout. + */ +struct dentry *au_wh_lkup(struct dentry *h_parent, struct qstr *base_name, + struct au_branch *br) +{ + int err; + struct qstr wh_name; + struct dentry *wh_dentry; + + err = au_wh_name_alloc(&wh_name, base_name); + wh_dentry = ERR_PTR(err); + if (!err) { + wh_dentry = au_lkup_one(&wh_name, h_parent, br, /*nd*/NULL); + kfree(wh_name.name); + } + return wh_dentry; +} + +/* + * link/create a whiteout for @dentry on @bindex. + */ +struct dentry *au_wh_create(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent) +{ + struct dentry *wh_dentry; + struct super_block *sb; + int err; + + sb = dentry->d_sb; + wh_dentry = au_wh_lkup(h_parent, &dentry->d_name, au_sbr(sb, bindex)); + if (!IS_ERR(wh_dentry) && !wh_dentry->d_inode) { + err = link_or_create_wh(sb, bindex, wh_dentry); + if (!err) + au_set_dbwh(dentry, bindex); + else { + dput(wh_dentry); + wh_dentry = ERR_PTR(err); + } + } + + return wh_dentry; +} + +/* ---------------------------------------------------------------------- */ + +/* Delete all whiteouts in this directory on branch bindex. */ +static int del_wh_children(struct dentry *h_dentry, struct au_nhash *whlist, + aufs_bindex_t bindex, struct au_branch *br) +{ + int err, i; + struct qstr wh_name; + char *p; + struct hlist_head *head; + struct au_vdir_wh *tpos; + struct hlist_node *pos; + struct au_vdir_destr *str; + + err = -ENOMEM; + p = __getname(); + wh_name.name = p; + if (unlikely(!wh_name.name)) + goto out; + + err = 0; + memcpy(p, AUFS_WH_PFX, AUFS_WH_PFX_LEN); + p += AUFS_WH_PFX_LEN; + head = whlist->heads; + for (i = 0; !err && i < AuSize_NHASH; i++, head++) { + hlist_for_each_entry(tpos, pos, head, wh_hash) { + if (tpos->wh_bindex != bindex) + continue; + + str = &tpos->wh_str; + if (str->len + AUFS_WH_PFX_LEN <= PATH_MAX) { + memcpy(p, str->name, str->len); + wh_name.len = AUFS_WH_PFX_LEN + str->len; + err = unlink_wh_name(h_dentry, &wh_name, br); + if (!err) + continue; + break; + } + AuIOErr("whiteout name too long %.*s\n", + str->len, str->name); + err = -EIO; + break; + } + } + __putname(wh_name.name); + + out: + return err; +} + +struct del_wh_children_args { + int *errp; + struct dentry *h_dentry; + struct au_nhash *whlist; + aufs_bindex_t bindex; + struct au_branch *br; +}; + +static void call_del_wh_children(void *args) +{ + struct del_wh_children_args *a = args; + *a->errp = del_wh_children(a->h_dentry, a->whlist, a->bindex, a->br); +} + +/* ---------------------------------------------------------------------- */ + +/* + * rmdir the whiteouted temporary named dir @h_dentry. + * @whlist: whiteouted children. + */ +int au_whtmp_rmdir(struct inode *dir, aufs_bindex_t bindex, + struct dentry *wh_dentry, struct au_nhash *whlist) +{ + int err; + struct path h_tmp; + struct inode *wh_inode, *h_dir; + struct au_branch *br; + + h_dir = wh_dentry->d_parent->d_inode; /* dir inode is locked */ + IMustLock(h_dir); + + br = au_sbr(dir->i_sb, bindex); + wh_inode = wh_dentry->d_inode; + mutex_lock_nested(&wh_inode->i_mutex, AuLsc_I_CHILD); + + /* + * someone else might change some whiteouts while we were sleeping. + * it means this whlist may have an obsoleted entry. + */ + if (!au_test_h_perm_sio(wh_inode, MAY_EXEC | MAY_WRITE)) + err = del_wh_children(wh_dentry, whlist, bindex, br); + else { + int wkq_err; + struct del_wh_children_args args = { + .errp = &err, + .h_dentry = wh_dentry, + .whlist = whlist, + .bindex = bindex, + .br = br + }; + + wkq_err = au_wkq_wait(call_del_wh_children, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } + mutex_unlock(&wh_inode->i_mutex); + + if (!err) { + h_tmp.dentry = wh_dentry; + h_tmp.mnt = br->br_mnt; + err = vfsub_rmdir(h_dir, &h_tmp); + /* d_drop(h_dentry); */ + } + + if (!err) { + if (au_ibstart(dir) == bindex) { + au_cpup_attr_timesizes(dir); + drop_nlink(dir); + } + return 0; /* success */ + } + + AuWarn("failed removing %.*s(%d), ignored\n", + AuDLNPair(wh_dentry), err); + return err; +} + +static void au_whtmp_rmdir_free_args(struct au_whtmp_rmdir_args *args) +{ + au_nhash_fin(&args->whlist); + dput(args->wh_dentry); + iput(args->dir); + kfree(args); +} + +static void call_rmdir_whtmp(void *args) +{ + int err; + struct au_whtmp_rmdir_args *a = args; + struct super_block *sb; + struct dentry *h_parent; + struct inode *h_dir; + struct au_branch *br; + struct au_hinode *hdir; + + /* rmdir by nfsd may cause deadlock with this i_mutex */ + /* mutex_lock(&a->dir->i_mutex); */ + sb = a->dir->i_sb; + si_noflush_read_lock(sb); + err = au_test_ro(sb, a->bindex, NULL); + if (unlikely(err)) + goto out; + + err = -EIO; + br = au_sbr(sb, a->bindex); + ii_write_lock_parent(a->dir); + h_parent = dget_parent(a->wh_dentry); + h_dir = h_parent->d_inode; + hdir = au_hi(a->dir, a->bindex); + au_hin_imtx_lock_nested(hdir, AuLsc_I_PARENT); + err = au_h_verify(a->wh_dentry, au_opt_udba(sb), h_dir, h_parent, br); + if (!err) { + err = mnt_want_write(br->br_mnt); + if (!err) { + err = au_whtmp_rmdir(a->dir, a->bindex, a->wh_dentry, + &a->whlist); + mnt_drop_write(br->br_mnt); + } + } + au_hin_imtx_unlock(hdir); + dput(h_parent); + ii_write_unlock(a->dir); + + out: + /* mutex_unlock(&a->dir->i_mutex); */ + au_nwt_done(&au_sbi(sb)->si_nowait); + si_read_unlock(sb); + au_whtmp_rmdir_free_args(a); + if (unlikely(err)) + AuIOErr("err %d\n", err); +} + +void au_whtmp_kick_rmdir(struct inode *dir, aufs_bindex_t bindex, + struct dentry *wh_dentry, struct au_nhash *whlist, + struct au_whtmp_rmdir_args *args) +{ + int wkq_err; + + IMustLock(dir); + + /* all post-process will be done in do_rmdir_whtmp(). */ + args->dir = au_igrab(dir); + args->bindex = bindex; + args->wh_dentry = dget(wh_dentry); + au_nhash_init(&args->whlist); + au_nhash_move(&args->whlist, whlist); + wkq_err = au_wkq_nowait(call_rmdir_whtmp, args, dir->i_sb); + if (unlikely(wkq_err)) { + AuWarn("rmdir error %.*s (%d), ignored\n", + AuDLNPair(wh_dentry), wkq_err); + au_whtmp_rmdir_free_args(args); + } +} diff -urN ./fs/aufs/whout.h ./fs/aufs/whout.h --- ./fs/aufs/whout.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/whout.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,78 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * whiteout for logical deletion and opaque directory + */ + +#ifndef __AUFS_WHOUT_H__ +#define __AUFS_WHOUT_H__ + +#ifdef __KERNEL__ + +#include +#include +#include "dir.h" + +/* whout.c */ +int au_wh_name_alloc(struct qstr *wh, const struct qstr *name); +struct au_branch; +int au_wh_test(struct dentry *h_parent, struct qstr *wh_name, + struct au_branch *br, int try_sio); +int au_diropq_test(struct dentry *h_dentry, struct au_branch *br); +struct dentry *au_whtmp_lkup(struct dentry *h_parent, struct au_branch *br, + struct qstr *prefix); +int au_whtmp_ren(struct dentry *h_dentry, struct au_branch *br); +int au_wh_unlink_dentry(struct inode *h_dir, struct path *h_path, + struct dentry *dentry); +int au_wh_init(struct dentry *h_parent, struct au_branch *br, + struct super_block *sb); + +/* diropq flags */ +#define AuDiropq_CREATE 1 +#define au_ftest_diropq(flags, name) ((flags) & AuDiropq_##name) +#define au_fset_diropq(flags, name) { (flags) |= AuDiropq_##name; } +#define au_fclr_diropq(flags, name) { (flags) &= ~AuDiropq_##name; } + +struct dentry *au_diropq_sio(struct dentry *dentry, aufs_bindex_t bindex, + unsigned int flags); +struct dentry *au_wh_lkup(struct dentry *h_parent, struct qstr *base_name, + struct au_branch *br); +struct dentry *au_wh_create(struct dentry *dentry, aufs_bindex_t bindex, + struct dentry *h_parent); + +/* real rmdir for the whiteout-ed dir */ +struct au_whtmp_rmdir_args { + struct inode *dir; + aufs_bindex_t bindex; + struct dentry *wh_dentry; + struct au_nhash whlist; +}; + +int au_whtmp_rmdir(struct inode *dir, aufs_bindex_t bindex, + struct dentry *wh_dentry, struct au_nhash *whlist); +void au_whtmp_kick_rmdir(struct inode *dir, aufs_bindex_t bindex, + struct dentry *wh_dentry, struct au_nhash *whlist, + struct au_whtmp_rmdir_args *args); + +/* ---------------------------------------------------------------------- */ + +static inline struct dentry *au_diropq_create(struct dentry *dentry, + aufs_bindex_t bindex) +{ + return au_diropq_sio(dentry, bindex, AuDiropq_CREATE); +} + +static inline int au_diropq_remove(struct dentry *dentry, aufs_bindex_t bindex) +{ + return PTR_ERR(au_diropq_sio(dentry, bindex, !AuDiropq_CREATE)); +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_WHOUT_H__ */ diff -urN ./fs/aufs/wkq.c ./fs/aufs/wkq.c --- ./fs/aufs/wkq.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/wkq.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,249 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * workqueue for asynchronous/super-io operations + * todo: try new dredential scheme + */ + +#include "aufs.h" + +/* internal workqueue named AUFS_WKQ_NAME */ +static struct au_wkq { + struct workqueue_struct *q; + + /* balancing */ + atomic_t busy; +} *au_wkq; + +struct au_wkinfo { + struct work_struct wk; + struct super_block *sb; + + unsigned int flags; /* see wkq.h */ + + au_wkq_func_t func; + void *args; + + atomic_t *busyp; + struct completion *comp; +}; + +/* ---------------------------------------------------------------------- */ + +static int enqueue(struct au_wkq *wkq, struct au_wkinfo *wkinfo) +{ + wkinfo->busyp = &wkq->busy; + if (au_ftest_wkq(wkinfo->flags, WAIT)) + return !queue_work(wkq->q, &wkinfo->wk); + else + return !schedule_work(&wkinfo->wk); +} + +static void do_wkq(struct au_wkinfo *wkinfo) +{ + unsigned int idle, n; + int i, idle_idx; + + while (1) { + if (au_ftest_wkq(wkinfo->flags, WAIT)) { + idle_idx = 0; + idle = UINT_MAX; + for (i = 0; i < aufs_nwkq; i++) { + n = atomic_inc_return(&au_wkq[i].busy); + if (n == 1 && !enqueue(au_wkq + i, wkinfo)) + return; /* success */ + + if (n < idle) { + idle_idx = i; + idle = n; + } + atomic_dec(&au_wkq[i].busy); + } + } else + idle_idx = aufs_nwkq; + + atomic_inc(&au_wkq[idle_idx].busy); + if (!enqueue(au_wkq + idle_idx, wkinfo)) + return; /* success */ + + /* impossible? */ + AuWarn1("failed to queue_work()\n"); + yield(); + } +} + +static void wkq_func(struct work_struct *wk) +{ + struct au_wkinfo *wkinfo = container_of(wk, struct au_wkinfo, wk); + + wkinfo->func(wkinfo->args); + atomic_dec(wkinfo->busyp); + if (au_ftest_wkq(wkinfo->flags, WAIT)) + complete(wkinfo->comp); + else { + kobject_put(&au_sbi(wkinfo->sb)->si_kobj); + module_put(THIS_MODULE); + kfree(wkinfo); + } +} + +/* + * Since struct completion is large, try allocating it dynamically. + */ +#if defined(CONFIG_4KSTACKS) || defined(AuTest4KSTACKS) +#define AuWkqCompDeclare(name) struct completion *comp = NULL + +static int au_wkq_comp_alloc(struct au_wkinfo *wkinfo, struct completion **comp) +{ + *comp = kmalloc(sizeof(**comp), GFP_NOFS); + if (*comp) { + init_completion(*comp); + wkinfo->comp = *comp; + return 0; + } + return -ENOMEM; +} + +static void au_wkq_comp_free(struct completion *comp) +{ + kfree(comp); +} + +#else + +/* no braces */ +#define AuWkqCompDeclare(name) \ + DECLARE_COMPLETION_ONSTACK(_ ## name); \ + struct completion *comp = &_ ## name + +static int au_wkq_comp_alloc(struct au_wkinfo *wkinfo, struct completion **comp) +{ + wkinfo->comp = *comp; + return 0; +} + +static void au_wkq_comp_free(struct completion *comp __maybe_unused) +{ + /* empty */ +} +#endif /* 4KSTACKS */ + +static void au_wkq_run(struct au_wkinfo *wkinfo) +{ + au_dbg_verify_kthread(); + INIT_WORK(&wkinfo->wk, wkq_func); + do_wkq(wkinfo); +} + +int au_wkq_wait(au_wkq_func_t func, void *args) +{ + int err; + AuWkqCompDeclare(comp); + struct au_wkinfo wkinfo = { + .flags = AuWkq_WAIT, + .func = func, + .args = args + }; + + err = au_wkq_comp_alloc(&wkinfo, &comp); + if (!err) { + au_wkq_run(&wkinfo); + /* no timeout, no interrupt */ + wait_for_completion(wkinfo.comp); + au_wkq_comp_free(comp); + } + + return err; + +} + +int au_wkq_nowait(au_wkq_func_t func, void *args, struct super_block *sb) +{ + int err; + struct au_wkinfo *wkinfo; + + atomic_inc(&au_sbi(sb)->si_nowait.nw_len); + + /* + * wkq_func() must free this wkinfo. + * it highly depends upon the implementation of workqueue. + */ + err = 0; + wkinfo = kmalloc(sizeof(*wkinfo), GFP_NOFS); + if (wkinfo) { + wkinfo->sb = sb; + wkinfo->flags = !AuWkq_WAIT; + wkinfo->func = func; + wkinfo->args = args; + wkinfo->comp = NULL; + kobject_get(&au_sbi(sb)->si_kobj); + __module_get(THIS_MODULE); + + au_wkq_run(wkinfo); + } else { + err = -ENOMEM; + atomic_dec(&au_sbi(sb)->si_nowait.nw_len); + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +void au_nwt_init(struct au_nowait_tasks *nwt) +{ + atomic_set(&nwt->nw_len, 0); + /* smp_mb();*/ /* atomic_set */ + init_waitqueue_head(&nwt->nw_wq); +} + +void au_wkq_fin(void) +{ + int i; + + for (i = 0; i < aufs_nwkq; i++) + if (au_wkq[i].q && !IS_ERR(au_wkq[i].q)) + destroy_workqueue(au_wkq[i].q); + kfree(au_wkq); +} + +int __init au_wkq_init(void) +{ + int err, i; + struct au_wkq *nowaitq; + + /* '+1' is for accounting of nowait queue */ + err = -ENOMEM; + au_wkq = kcalloc(aufs_nwkq + 1, sizeof(*au_wkq), GFP_NOFS); + if (unlikely(!au_wkq)) + goto out; + + err = 0; + for (i = 0; i < aufs_nwkq; i++) { + au_wkq[i].q = create_singlethread_workqueue(AUFS_WKQ_NAME); + if (au_wkq[i].q && !IS_ERR(au_wkq[i].q)) { + atomic_set(&au_wkq[i].busy, 0); + continue; + } + + err = PTR_ERR(au_wkq[i].q); + au_wkq_fin(); + goto out; + } + + /* nowait accounting */ + nowaitq = au_wkq + aufs_nwkq; + atomic_set(&nowaitq->busy, 0); + nowaitq->q = NULL; + /* smp_mb(); */ /* atomic_set */ + + out: + return err; +} diff -urN ./fs/aufs/wkq.h ./fs/aufs/wkq.h --- ./fs/aufs/wkq.h 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/wkq.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,72 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * workqueue for asynchronous/super-io operations + * todo: try new credentials management scheme + */ + +#ifndef __AUFS_WKQ_H__ +#define __AUFS_WKQ_H__ + +#ifdef __KERNEL__ + +#include +#include +#include +#include + +/* ---------------------------------------------------------------------- */ + +/* + * in the next operation, wait for the 'nowait' tasks in system-wide workqueue + */ +struct au_nowait_tasks { + atomic_t nw_len; + wait_queue_head_t nw_wq; +}; + +/* ---------------------------------------------------------------------- */ + +typedef void (*au_wkq_func_t)(void *args); + +/* wkq flags */ +#define AuWkq_WAIT 1 +#define au_ftest_wkq(flags, name) ((flags) & AuWkq_##name) +#define au_fset_wkq(flags, name) { (flags) |= AuWkq_##name; } +#define au_fclr_wkq(flags, name) { (flags) &= ~AuWkq_##name; } + +/* wkq.c */ +int au_wkq_wait(au_wkq_func_t func, void *args); +int au_wkq_nowait(au_wkq_func_t func, void *args, struct super_block *sb); +void au_nwt_init(struct au_nowait_tasks *nwt); +int __init au_wkq_init(void); +void au_wkq_fin(void); + +/* ---------------------------------------------------------------------- */ + +static inline int au_test_wkq(struct task_struct *tsk) +{ + return !tsk->mm && !strcmp(tsk->comm, AUFS_WKQ_NAME); +} + +static inline void au_nwt_done(struct au_nowait_tasks *nwt) +{ + if (!atomic_dec_return(&nwt->nw_len)) + wake_up_all(&nwt->nw_wq); +} + +static inline int au_nwt_flush(struct au_nowait_tasks *nwt) +{ + wait_event(nwt->nw_wq, !atomic_read(&nwt->nw_len)); + return 0; +} + +#endif /* __KERNEL__ */ +#endif /* __AUFS_WKQ_H__ */ diff -urN ./fs/aufs/xino.c ./fs/aufs/xino.c --- ./fs/aufs/xino.c 1970-01-01 01:00:00.000000000 +0100 +++ ./fs/aufs/xino.c 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,1178 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* + * external inode number translation table and bitmap + */ + +#include +#include +#include "aufs.h" + +ssize_t xino_fread(au_readf_t func, struct file *file, void *buf, size_t size, + loff_t *pos) +{ + ssize_t err; + mm_segment_t oldfs; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + do { + /* todo: signal_pending? */ + err = func(file, (char __user *)buf, size, pos); + } while (err == -EAGAIN || err == -EINTR); + set_fs(oldfs); + +#if 0 /* reserved for future use */ + if (err > 0) + fsnotify_access(file->f_dentry); +#endif + + return err; +} + +/* ---------------------------------------------------------------------- */ + +static ssize_t do_xino_fwrite(au_writef_t func, struct file *file, void *buf, + size_t size, loff_t *pos) +{ + ssize_t err; + mm_segment_t oldfs; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + lockdep_off(); + do { + /* todo: signal_pending? */ + err = func(file, (const char __user *)buf, size, pos); + } while (err == -EAGAIN || err == -EINTR); + lockdep_on(); + set_fs(oldfs); + +#if 0 /* reserved for future use */ + if (err > 0) + fsnotify_modify(file->f_dentry); +#endif + + return err; +} + +struct do_xino_fwrite_args { + ssize_t *errp; + au_writef_t func; + struct file *file; + void *buf; + size_t size; + loff_t *pos; +}; + +static void call_do_xino_fwrite(void *args) +{ + struct do_xino_fwrite_args *a = args; + *a->errp = do_xino_fwrite(a->func, a->file, a->buf, a->size, a->pos); +} + +ssize_t xino_fwrite(au_writef_t func, struct file *file, void *buf, size_t size, + loff_t *pos) +{ + ssize_t err; + + /* todo: signal block and no wkq? */ + /* todo: new credential scheme */ + /* + * it breaks RLIMIT_FSIZE and normal user's limit, + * users should care about quota and real 'filesystem full.' + */ + if (!au_test_wkq(current)) { + int wkq_err; + struct do_xino_fwrite_args args = { + .errp = &err, + .func = func, + .file = file, + .buf = buf, + .size = size, + .pos = pos + }; + + wkq_err = au_wkq_wait(call_do_xino_fwrite, &args); + if (unlikely(wkq_err)) + err = wkq_err; + } else + err = do_xino_fwrite(func, file, buf, size, pos); + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * create a new xinofile at the same place/path as @base_file. + */ +struct file *au_xino_create2(struct file *base_file, struct file *copy_src) +{ + struct file *file; + struct dentry *base, *dentry, *parent; + struct inode *dir; + struct qstr *name; + int err; + + base = base_file->f_dentry; + parent = base->d_parent; /* dir inode is locked */ + dir = parent->d_inode; + IMustLock(dir); + + file = ERR_PTR(-EINVAL); + name = &base->d_name; + dentry = vfsub_lookup_one_len(name->name, parent, name->len); + if (IS_ERR(dentry)) { + file = (void *)dentry; + AuErr("%.*s lookup err %ld\n", AuLNPair(name), PTR_ERR(dentry)); + goto out; + } + + /* no need to mnt_want_write() since we call dentry_open() later */ + err = vfs_create(dir, dentry, S_IRUGO | S_IWUGO, NULL); + if (unlikely(err)) { + file = ERR_PTR(err); + AuErr("%.*s create err %d\n", AuLNPair(name), err); + goto out_dput; + } + + file = dentry_open(dget(dentry), mntget(base_file->f_vfsmnt), + O_RDWR | O_CREAT | O_EXCL | O_LARGEFILE, + current_cred()); + if (IS_ERR(file)) { + AuErr("%.*s open err %ld\n", AuLNPair(name), PTR_ERR(file)); + goto out_dput; + } + AuDebugOn(!file->f_op); + + err = vfsub_unlink(dir, &file->f_path, /*force*/0); + if (unlikely(err)) { + AuErr("%.*s unlink err %d\n", AuLNPair(name), err); + goto out_fput; + } + + if (copy_src) { + /* no one can touch copy_src xino */ + err = au_copy_file(file, copy_src, + i_size_read(copy_src->f_dentry->d_inode)); + if (unlikely(err)) { + AuErr("%.*s copy err %d\n", AuLNPair(name), err); + goto out_fput; + } + } + goto out_dput; /* success */ + + out_fput: + fput(file); + file = ERR_PTR(err); + out_dput: + dput(dentry); + out: + return file; +} + +struct au_xino_lock_dir { + struct au_hinode *hdir; + struct dentry *parent; + struct mutex *mtx; +}; + +static void au_xino_lock_dir(struct super_block *sb, struct file *xino, + struct au_xino_lock_dir *ldir) +{ + aufs_bindex_t brid, bindex; + + ldir->hdir = NULL; + bindex = -1; + brid = au_xino_brid(sb); + if (brid >= 0) + bindex = au_br_index(sb, brid); + if (bindex >= 0) { + ldir->hdir = au_hi(sb->s_root->d_inode, bindex); + au_hin_imtx_lock_nested(ldir->hdir, AuLsc_I_PARENT); + } else { + ldir->parent = dget_parent(xino->f_dentry); + ldir->mtx = &ldir->parent->d_inode->i_mutex; + mutex_lock_nested(ldir->mtx, AuLsc_I_PARENT); + } +} + +static void au_xino_unlock_dir(struct au_xino_lock_dir *ldir) +{ + if (ldir->hdir) + au_hin_imtx_unlock(ldir->hdir); + else { + mutex_unlock(ldir->mtx); + dput(ldir->parent); + } +} + +/* ---------------------------------------------------------------------- */ + +/* trucate xino files asynchronously */ + +int au_xino_trunc(struct super_block *sb, aufs_bindex_t bindex) +{ + int err; + aufs_bindex_t bi, bend; + struct au_branch *br; + struct file *new_xino, *file; + struct super_block *h_sb; + struct au_xino_lock_dir ldir; + + err = -EINVAL; + bend = au_sbend(sb); + if (unlikely(bindex < 0 || bend < bindex)) + goto out; + br = au_sbr(sb, bindex); + file = br->br_xino.xi_file; + if (!file) + goto out; + + au_xino_lock_dir(sb, file, &ldir); + /* mnt_want_write() is unnecessary here */ + new_xino = au_xino_create2(file, file); + au_xino_unlock_dir(&ldir); + err = PTR_ERR(new_xino); + if (IS_ERR(new_xino)) + goto out; + err = 0; + fput(file); + br->br_xino.xi_file = new_xino; + + h_sb = br->br_mnt->mnt_sb; + for (bi = 0; bi <= bend; bi++) { + if (unlikely(bi == bindex)) + continue; + br = au_sbr(sb, bi); + if (br->br_mnt->mnt_sb != h_sb) + continue; + + fput(br->br_xino.xi_file); + br->br_xino.xi_file = new_xino; + get_file(new_xino); + } + + out: + return err; +} + +struct xino_do_trunc_args { + struct super_block *sb; + struct au_branch *br; +}; + +static void xino_do_trunc(void *_args) +{ + struct xino_do_trunc_args *args = _args; + struct super_block *sb; + struct au_branch *br; + struct inode *dir; + int err; + aufs_bindex_t bindex; + + err = 0; + sb = args->sb; + dir = sb->s_root->d_inode; + br = args->br; + + si_noflush_write_lock(sb); + ii_read_lock_parent(dir); + bindex = au_br_index(sb, br->br_id); + err = au_xino_trunc(sb, bindex); + if (unlikely(err)) + goto out; + + if (br->br_xino.xi_file->f_dentry->d_inode->i_blocks + >= br->br_xino_upper) + br->br_xino_upper += AUFS_XINO_TRUNC_STEP; + + out: + ii_read_unlock(dir); + if (unlikely(err)) + AuWarn("err b%d, (%d)\n", bindex, err); + atomic_dec(&br->br_xino_running); + atomic_dec(&br->br_count); + au_nwt_done(&au_sbi(sb)->si_nowait); + si_write_unlock(sb); + kfree(args); +} + +static void xino_try_trunc(struct super_block *sb, struct au_branch *br) +{ + struct xino_do_trunc_args *args; + int wkq_err; + + if (br->br_xino.xi_file->f_dentry->d_inode->i_blocks + < br->br_xino_upper) + return; + + if (atomic_inc_return(&br->br_xino_running) > 1) + goto out; + + /* lock and kfree() will be called in trunc_xino() */ + args = kmalloc(sizeof(*args), GFP_NOFS); + if (unlikely(!args)) { + AuErr1("no memory\n"); + goto out_args; + } + + atomic_inc(&br->br_count); + args->sb = sb; + args->br = br; + wkq_err = au_wkq_nowait(xino_do_trunc, args, sb); + if (!wkq_err) + return; /* success */ + + AuErr("wkq %d\n", wkq_err); + atomic_dec(&br->br_count); + + out_args: + kfree(args); + out: + atomic_dec(&br->br_xino_running); +} + +/* ---------------------------------------------------------------------- */ + +static int au_xino_do_write(au_writef_t write, struct file *file, + ino_t h_ino, ino_t ino) +{ + loff_t pos; + ssize_t sz; + + pos = h_ino; + if (unlikely(au_loff_max / sizeof(ino) - 1 < pos)) { + AuIOErr1("too large hi%lu\n", (unsigned long)h_ino); + return -EFBIG; + } + pos *= sizeof(ino); + sz = xino_fwrite(write, file, &ino, sizeof(ino), &pos); + if (sz == sizeof(ino)) + return 0; /* success */ + + AuIOErr("write failed (%zd)\n", sz); + return -EIO; +} + +/* + * write @ino to the xinofile for the specified branch{@sb, @bindex} + * at the position of @h_ino. + * even if @ino is zero, it is written to the xinofile and means no entry. + * if the size of the xino file on a specific filesystem exceeds the watermark, + * try truncating it. + */ +int au_xino_write(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t ino) +{ + int err; + unsigned int mnt_flags; + struct au_branch *br; + + BUILD_BUG_ON(sizeof(long long) != sizeof(au_loff_max) + || ((loff_t)-1) > 0); + + mnt_flags = au_mntflags(sb); + if (!au_opt_test(mnt_flags, XINO)) + return 0; + + br = au_sbr(sb, bindex); + err = au_xino_do_write(au_sbi(sb)->si_xwrite, br->br_xino.xi_file, + h_ino, ino); + if (!err) { + if (au_opt_test(mnt_flags, TRUNC_XINO) + && au_test_fs_trunc_xino(br->br_mnt->mnt_sb)) + xino_try_trunc(sb, br); + return 0; /* success */ + } + + AuIOErr("write failed (%d)\n", err); + return -EIO; +} + +/* ---------------------------------------------------------------------- */ + +/* aufs inode number bitmap */ + +static const int page_bits = (int)PAGE_SIZE * BITS_PER_BYTE; +static ino_t xib_calc_ino(unsigned long pindex, int bit) +{ + ino_t ino; + + AuDebugOn(bit < 0 || page_bits <= bit); + ino = AUFS_FIRST_INO + pindex * page_bits + bit; + return ino; +} + +static void xib_calc_bit(ino_t ino, unsigned long *pindex, int *bit) +{ + AuDebugOn(ino < AUFS_FIRST_INO); + ino -= AUFS_FIRST_INO; + *pindex = ino / page_bits; + *bit = ino % page_bits; +} + +static int xib_pindex(struct super_block *sb, unsigned long pindex) +{ + int err; + loff_t pos; + ssize_t sz; + struct au_sbinfo *sbinfo; + struct file *xib; + unsigned long *p; + + sbinfo = au_sbi(sb); + MtxMustLock(&sbinfo->si_xib_mtx); + AuDebugOn(pindex > ULONG_MAX / PAGE_SIZE + || !au_opt_test(sbinfo->si_mntflags, XINO)); + + if (pindex == sbinfo->si_xib_last_pindex) + return 0; + + xib = sbinfo->si_xib; + p = sbinfo->si_xib_buf; + pos = sbinfo->si_xib_last_pindex; + pos *= PAGE_SIZE; + sz = xino_fwrite(sbinfo->si_xwrite, xib, p, PAGE_SIZE, &pos); + if (unlikely(sz != PAGE_SIZE)) + goto out; + + pos = pindex; + pos *= PAGE_SIZE; + if (i_size_read(xib->f_dentry->d_inode) >= pos + PAGE_SIZE) + sz = xino_fread(sbinfo->si_xread, xib, p, PAGE_SIZE, &pos); + else { + memset(p, 0, PAGE_SIZE); + sz = xino_fwrite(sbinfo->si_xwrite, xib, p, PAGE_SIZE, &pos); + } + if (sz == PAGE_SIZE) { + sbinfo->si_xib_last_pindex = pindex; + return 0; /* success */ + } + + out: + AuIOErr1("write failed (%zd)\n", sz); + err = sz; + if (sz >= 0) + err = -EIO; + return err; +} + +/* ---------------------------------------------------------------------- */ + +int au_xino_write0(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t ino) +{ + int err, bit; + unsigned long pindex; + struct au_sbinfo *sbinfo; + + if (!au_opt_test(au_mntflags(sb), XINO)) + return 0; + + err = 0; + if (ino) { + sbinfo = au_sbi(sb); + xib_calc_bit(ino, &pindex, &bit); + AuDebugOn(page_bits <= bit); + mutex_lock(&sbinfo->si_xib_mtx); + err = xib_pindex(sb, pindex); + if (!err) { + clear_bit(bit, sbinfo->si_xib_buf); + sbinfo->si_xib_next_bit = bit; + } + mutex_unlock(&sbinfo->si_xib_mtx); + } + + if (!err) + err = au_xino_write(sb, bindex, h_ino, 0); + return err; +} + +/* get an unused inode number from bitmap */ +ino_t au_xino_new_ino(struct super_block *sb) +{ + ino_t ino; + unsigned long *p, pindex, ul, pend; + struct au_sbinfo *sbinfo; + struct file *file; + int free_bit, err; + + if (!au_opt_test(au_mntflags(sb), XINO)) + return iunique(sb, AUFS_FIRST_INO); + + sbinfo = au_sbi(sb); + mutex_lock(&sbinfo->si_xib_mtx); + p = sbinfo->si_xib_buf; + free_bit = sbinfo->si_xib_next_bit; + if (free_bit < page_bits && !test_bit(free_bit, p)) + goto out; /* success */ + free_bit = find_first_zero_bit(p, page_bits); + if (free_bit < page_bits) + goto out; /* success */ + + pindex = sbinfo->si_xib_last_pindex; + for (ul = pindex - 1; ul < ULONG_MAX; ul--) { + err = xib_pindex(sb, ul); + if (unlikely(err)) + goto out_err; + free_bit = find_first_zero_bit(p, page_bits); + if (free_bit < page_bits) + goto out; /* success */ + } + + file = sbinfo->si_xib; + pend = i_size_read(file->f_dentry->d_inode) / PAGE_SIZE; + for (ul = pindex + 1; ul <= pend; ul++) { + err = xib_pindex(sb, ul); + if (unlikely(err)) + goto out_err; + free_bit = find_first_zero_bit(p, page_bits); + if (free_bit < page_bits) + goto out; /* success */ + } + BUG(); + + out: + set_bit(free_bit, p); + sbinfo->si_xib_next_bit++; + pindex = sbinfo->si_xib_last_pindex; + mutex_unlock(&sbinfo->si_xib_mtx); + ino = xib_calc_ino(pindex, free_bit); + AuDbg("i%lu\n", (unsigned long)ino); + return ino; + out_err: + mutex_unlock(&sbinfo->si_xib_mtx); + AuDbg("i0\n"); + return 0; +} + +/* + * read @ino from xinofile for the specified branch{@sb, @bindex} + * at the position of @h_ino. + * if @ino does not exist and @do_new is true, get new one. + */ +int au_xino_read(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino, + ino_t *ino) +{ + int err; + ssize_t sz; + loff_t pos; + struct file *file; + struct au_sbinfo *sbinfo; + + *ino = 0; + if (!au_opt_test(au_mntflags(sb), XINO)) + return 0; /* no xino */ + + err = 0; + sbinfo = au_sbi(sb); + pos = h_ino; + if (unlikely(au_loff_max / sizeof(*ino) - 1 < pos)) { + AuIOErr1("too large hi%lu\n", (unsigned long)h_ino); + return -EFBIG; + } + pos *= sizeof(*ino); + + file = au_sbr(sb, bindex)->br_xino.xi_file; + if (i_size_read(file->f_dentry->d_inode) < pos + sizeof(*ino)) + return 0; /* no ino */ + + sz = xino_fread(sbinfo->si_xread, file, ino, sizeof(*ino), &pos); + if (sz == sizeof(*ino)) + return 0; /* success */ + + err = sz; + if (unlikely(sz >= 0)) { + err = -EIO; + AuIOErr("xino read error (%zd)\n", sz); + } + + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* create and set a new xino file */ + +struct file *au_xino_create(struct super_block *sb, char *fname, int silent) +{ + struct file *file; + struct dentry *h_parent, *d; + struct inode *h_dir; + int err; + + /* + * at mount-time, and the xino file is the default path, + * hinotify is disabled so we have no inotify events to ignore. + * when a user specified the xino, we cannot get au_hdir to be ignored. + */ + file = vfsub_filp_open(fname, O_RDWR | O_CREAT | O_EXCL | O_LARGEFILE, + S_IRUGO | S_IWUGO); + if (IS_ERR(file)) { + if (!silent) + AuErr("open %s(%ld)\n", fname, PTR_ERR(file)); + return file; + } + + /* keep file count */ + h_parent = dget_parent(file->f_dentry); + h_dir = h_parent->d_inode; + mutex_lock_nested(&h_dir->i_mutex, AuLsc_I_PARENT); + /* mnt_want_write() is unnecessary here */ + err = vfsub_unlink(h_dir, &file->f_path, /*force*/0); + mutex_unlock(&h_dir->i_mutex); + dput(h_parent); + if (unlikely(err)) { + if (!silent) + AuErr("unlink %s(%d)\n", fname, err); + goto out; + } + + err = -EINVAL; + d = file->f_dentry; + if (unlikely(sb == d->d_sb)) { + if (!silent) + AuErr("%s must be outside\n", fname); + goto out; + } + if (unlikely(au_test_fs_bad_xino(d->d_sb))) { + if (!silent) + AuErr("xino doesn't support %s(%s)\n", + fname, au_sbtype(d->d_sb)); + goto out; + } + return file; /* success */ + + out: + fput(file); + file = ERR_PTR(err); + return file; +} + +/* + * find another branch who is on the same filesystem of the specified + * branch{@btgt}. search until @bend. + */ +static int is_sb_shared(struct super_block *sb, aufs_bindex_t btgt, + aufs_bindex_t bend) +{ + aufs_bindex_t bindex; + struct super_block *tgt_sb = au_sbr_sb(sb, btgt); + + for (bindex = 0; bindex < btgt; bindex++) + if (unlikely(tgt_sb == au_sbr_sb(sb, bindex))) + return bindex; + for (bindex++; bindex <= bend; bindex++) + if (unlikely(tgt_sb == au_sbr_sb(sb, bindex))) + return bindex; + return -1; +} + +/* ---------------------------------------------------------------------- */ + +/* + * initialize the xinofile for the specified branch @br + * at the place/path where @base_file indicates. + * test whether another branch is on the same filesystem or not, + * if @do_test is true. + */ +int au_xino_br(struct super_block *sb, struct au_branch *br, ino_t h_ino, + struct file *base_file, int do_test) +{ + int err; + ino_t ino; + aufs_bindex_t bend, bindex; + struct au_branch *shared_br, *b; + struct file *file; + struct super_block *tgt_sb; + + shared_br = NULL; + bend = au_sbend(sb); + if (do_test) { + tgt_sb = br->br_mnt->mnt_sb; + for (bindex = 0; bindex <= bend; bindex++) { + b = au_sbr(sb, bindex); + if (tgt_sb == b->br_mnt->mnt_sb) { + shared_br = b; + break; + } + } + } + + if (!shared_br || !shared_br->br_xino.xi_file) { + struct au_xino_lock_dir ldir; + + au_xino_lock_dir(sb, base_file, &ldir); + /* mnt_want_write() is unnecessary here */ + file = au_xino_create2(base_file, NULL); + au_xino_unlock_dir(&ldir); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + br->br_xino.xi_file = file; + } else { + br->br_xino.xi_file = shared_br->br_xino.xi_file; + get_file(br->br_xino.xi_file); + } + + ino = AUFS_ROOT_INO; + err = au_xino_do_write(au_sbi(sb)->si_xwrite, br->br_xino.xi_file, + h_ino, ino); + if (!err) + return 0; /* success */ + + + out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* trucate a xino bitmap file */ + +/* todo: slow */ +static int do_xib_restore(struct super_block *sb, struct file *file, void *page) +{ + int err, bit; + ssize_t sz; + unsigned long pindex; + loff_t pos, pend; + struct au_sbinfo *sbinfo; + au_readf_t func; + ino_t *ino; + unsigned long *p; + + err = 0; + sbinfo = au_sbi(sb); + p = sbinfo->si_xib_buf; + func = sbinfo->si_xread; + pend = i_size_read(file->f_dentry->d_inode); + pos = 0; + while (pos < pend) { + sz = xino_fread(func, file, page, PAGE_SIZE, &pos); + err = sz; + if (unlikely(sz <= 0)) + goto out; + + err = 0; + for (ino = page; sz > 0; ino++, sz -= sizeof(ino)) { + if (unlikely(*ino < AUFS_FIRST_INO)) + continue; + + xib_calc_bit(*ino, &pindex, &bit); + AuDebugOn(page_bits <= bit); + err = xib_pindex(sb, pindex); + if (!err) + set_bit(bit, p); + else + goto out; + } + } + + out: + return err; +} + +static int xib_restore(struct super_block *sb) +{ + int err; + aufs_bindex_t bindex, bend; + void *page; + + err = -ENOMEM; + page = (void *)__get_free_page(GFP_NOFS); + if (unlikely(!page)) + goto out; + + err = 0; + bend = au_sbend(sb); + for (bindex = 0; !err && bindex <= bend; bindex++) + if (!bindex || is_sb_shared(sb, bindex, bindex - 1) < 0) + err = do_xib_restore + (sb, au_sbr(sb, bindex)->br_xino.xi_file, page); + else + AuDbg("b%d\n", bindex); + free_page((unsigned long)page); + + out: + return err; +} + +int au_xib_trunc(struct super_block *sb) +{ + int err; + ssize_t sz; + loff_t pos; + struct au_xino_lock_dir ldir; + struct au_sbinfo *sbinfo; + unsigned long *p; + struct file *file; + + err = 0; + sbinfo = au_sbi(sb); + if (!au_opt_test(sbinfo->si_mntflags, XINO)) + goto out; + + file = sbinfo->si_xib; + if (i_size_read(file->f_dentry->d_inode) <= PAGE_SIZE) + goto out; + + au_xino_lock_dir(sb, file, &ldir); + /* mnt_want_write() is unnecessary here */ + file = au_xino_create2(sbinfo->si_xib, NULL); + au_xino_unlock_dir(&ldir); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + fput(sbinfo->si_xib); + sbinfo->si_xib = file; + + p = sbinfo->si_xib_buf; + memset(p, 0, PAGE_SIZE); + pos = 0; + sz = xino_fwrite(sbinfo->si_xwrite, sbinfo->si_xib, p, PAGE_SIZE, &pos); + if (unlikely(sz != PAGE_SIZE)) { + err = sz; + AuIOErr("err %d\n", err); + if (sz >= 0) + err = -EIO; + goto out; + } + + mutex_lock(&sbinfo->si_xib_mtx); + /* mnt_want_write() is unnecessary here */ + err = xib_restore(sb); + mutex_unlock(&sbinfo->si_xib_mtx); + +out: + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * xino mount option handlers + */ +static au_readf_t find_readf(struct file *h_file) +{ + const struct file_operations *fop = h_file->f_op; + + if (fop->read) + return fop->read; + if (fop->aio_read) + return do_sync_read; + return ERR_PTR(-ENOSYS); +} + +static au_writef_t find_writef(struct file *h_file) +{ + const struct file_operations *fop = h_file->f_op; + + if (fop->write) + return fop->write; + if (fop->aio_write) + return do_sync_write; + return ERR_PTR(-ENOSYS); +} + +/* xino bitmap */ +static void xino_clear_xib(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + + sbinfo = au_sbi(sb); + sbinfo->si_xread = NULL; + sbinfo->si_xwrite = NULL; + if (sbinfo->si_xib) + fput(sbinfo->si_xib); + sbinfo->si_xib = NULL; + free_page((unsigned long)sbinfo->si_xib_buf); + sbinfo->si_xib_buf = NULL; +} + +static int au_xino_set_xib(struct super_block *sb, struct file *base) +{ + int err; + loff_t pos; + struct au_sbinfo *sbinfo; + struct file *file; + + sbinfo = au_sbi(sb); + file = au_xino_create2(base, sbinfo->si_xib); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + if (sbinfo->si_xib) + fput(sbinfo->si_xib); + sbinfo->si_xib = file; + sbinfo->si_xread = find_readf(file); + sbinfo->si_xwrite = find_writef(file); + + err = -ENOMEM; + if (!sbinfo->si_xib_buf) + sbinfo->si_xib_buf = (void *)get_zeroed_page(GFP_NOFS); + if (unlikely(!sbinfo->si_xib_buf)) + goto out_unset; + + sbinfo->si_xib_last_pindex = 0; + sbinfo->si_xib_next_bit = 0; + if (i_size_read(file->f_dentry->d_inode) < PAGE_SIZE) { + pos = 0; + err = xino_fwrite(sbinfo->si_xwrite, file, sbinfo->si_xib_buf, + PAGE_SIZE, &pos); + if (unlikely(err != PAGE_SIZE)) + goto out_free; + } + err = 0; + goto out; /* success */ + + out_free: + free_page((unsigned long)sbinfo->si_xib_buf); + sbinfo->si_xib_buf = NULL; + if (err >= 0) + err = -EIO; + out_unset: + fput(sbinfo->si_xib); + sbinfo->si_xib = NULL; + sbinfo->si_xread = NULL; + sbinfo->si_xwrite = NULL; + out: + return err; +} + +/* xino for each branch */ +static void xino_clear_br(struct super_block *sb) +{ + aufs_bindex_t bindex, bend; + struct au_branch *br; + + bend = au_sbend(sb); + for (bindex = 0; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + if (!br || !br->br_xino.xi_file) + continue; + + fput(br->br_xino.xi_file); + br->br_xino.xi_file = NULL; + } +} + +static int au_xino_set_br(struct super_block *sb, struct file *base) +{ + int err; + ino_t ino; + aufs_bindex_t bindex, bend, bshared; + struct { + struct file *old, *new; + } *fpair, *p; + struct au_branch *br; + struct inode *inode; + au_writef_t writef; + + err = -ENOMEM; + bend = au_sbend(sb); + fpair = kcalloc(bend + 1, sizeof(*fpair), GFP_NOFS); + if (unlikely(!fpair)) + goto out; + + inode = sb->s_root->d_inode; + ino = AUFS_ROOT_INO; + writef = au_sbi(sb)->si_xwrite; + for (bindex = 0, p = fpair; bindex <= bend; bindex++, p++) { + br = au_sbr(sb, bindex); + bshared = is_sb_shared(sb, bindex, bindex - 1); + if (bshared >= 0) { + /* shared xino */ + *p = fpair[bshared]; + get_file(p->new); + } + + if (!p->new) { + /* new xino */ + p->old = br->br_xino.xi_file; + p->new = au_xino_create2(base, br->br_xino.xi_file); + err = PTR_ERR(p->new); + if (IS_ERR(p->new)) { + p->new = NULL; + goto out_pair; + } + } + + err = au_xino_do_write(writef, p->new, + au_h_iptr(inode, bindex)->i_ino, ino); + if (unlikely(err)) + goto out_pair; + } + + for (bindex = 0, p = fpair; bindex <= bend; bindex++, p++) { + br = au_sbr(sb, bindex); + if (br->br_xino.xi_file) + fput(br->br_xino.xi_file); + get_file(p->new); + br->br_xino.xi_file = p->new; + } + + out_pair: + for (bindex = 0, p = fpair; bindex <= bend; bindex++, p++) + if (p->new) + fput(p->new); + else + break; + kfree(fpair); + out: + return err; +} + +void au_xino_clr(struct super_block *sb) +{ + struct au_sbinfo *sbinfo; + + au_xigen_clr(sb); + xino_clear_xib(sb); + xino_clear_br(sb); + sbinfo = au_sbi(sb); + /* lvalue, do not call au_mntflags() */ + au_opt_clr(sbinfo->si_mntflags, XINO); +} + +int au_xino_set(struct super_block *sb, struct au_opt_xino *xino, int remount) +{ + int err, skip; + struct dentry *parent, *cur_parent; + struct qstr *dname, *cur_name; + struct file *cur_xino; + struct inode *dir; + struct au_sbinfo *sbinfo; + + err = 0; + sbinfo = au_sbi(sb); + parent = dget_parent(xino->file->f_dentry); + if (remount) { + skip = 0; + dname = &xino->file->f_dentry->d_name; + cur_xino = sbinfo->si_xib; + if (cur_xino) { + cur_parent = dget_parent(cur_xino->f_dentry); + cur_name = &cur_xino->f_dentry->d_name; + skip = (cur_parent == parent + && dname->len == cur_name->len + && !memcmp(dname->name, cur_name->name, + dname->len)); + dput(cur_parent); + } + if (skip) + goto out; + } + + au_opt_set(sbinfo->si_mntflags, XINO); + dir = parent->d_inode; + mutex_lock_nested(&dir->i_mutex, AuLsc_I_PARENT); + /* mnt_want_write() is unnecessary here */ + err = au_xino_set_xib(sb, xino->file); + if (!err) + err = au_xigen_set(sb, xino->file); + if (!err) + err = au_xino_set_br(sb, xino->file); + mutex_unlock(&dir->i_mutex); + if (!err) + goto out; /* success */ + + /* reset all */ + AuIOErr("failed creating xino(%d).\n", err); + + out: + dput(parent); + return err; +} + +/* ---------------------------------------------------------------------- */ + +/* + * create a xinofile at the default place/path. + */ +struct file *au_xino_def(struct super_block *sb) +{ + struct file *file; + char *page, *p; + struct au_branch *br; + struct super_block *h_sb; + struct path path; + aufs_bindex_t bend, bindex, bwr; + + br = NULL; + bend = au_sbend(sb); + bwr = -1; + for (bindex = 0; bindex <= bend; bindex++) { + br = au_sbr(sb, bindex); + if (au_br_writable(br->br_perm) + && !au_test_fs_bad_xino(br->br_mnt->mnt_sb)) { + bwr = bindex; + break; + } + } + + if (bwr >= 0) { + file = ERR_PTR(-ENOMEM); + page = __getname(); + if (unlikely(!page)) + goto out; + path.mnt = br->br_mnt; + path.dentry = au_h_dptr(sb->s_root, bwr); + p = d_path(&path, page, PATH_MAX - sizeof(AUFS_XINO_FNAME)); + file = (void *)p; + if (!IS_ERR(p)) { + strcat(p, "/" AUFS_XINO_FNAME); + AuDbg("%s\n", p); + file = au_xino_create(sb, p, /*silent*/0); + if (!IS_ERR(file)) + au_xino_brid_set(sb, br->br_id); + } + __putname(page); + } else { + file = au_xino_create(sb, AUFS_XINO_DEFPATH, /*silent*/0); + if (IS_ERR(file)) + goto out; + h_sb = file->f_dentry->d_sb; + if (unlikely(au_test_fs_bad_xino(h_sb))) { + AuErr("xino doesn't support %s(%s)\n", + AUFS_XINO_DEFPATH, au_sbtype(h_sb)); + fput(file); + file = ERR_PTR(-EINVAL); + } + if (!IS_ERR(file)) + au_xino_brid_set(sb, -1); + } + + out: + return file; +} + +/* ---------------------------------------------------------------------- */ + +int au_xino_path(struct seq_file *seq, struct file *file) +{ + int err; + + err = au_seq_path(seq, &file->f_path); + if (unlikely(err < 0)) + goto out; + + err = 0; +#define Deleted "\\040(deleted)" + seq->count -= sizeof(Deleted) - 1; + AuDebugOn(memcmp(seq->buf + seq->count, Deleted, + sizeof(Deleted) - 1)); +#undef Deleted + + out: + return err; +} diff -urN ./include/linux/aufs_type.h ./include/linux/aufs_type.h --- ./include/linux/aufs_type.h 1970-01-01 01:00:00.000000000 +0100 +++ ./include/linux/aufs_type.h 2009-04-06 10:26:46.000000000 +0200 @@ -0,0 +1,98 @@ +/* + * Copyright (C) 2005-2009 Junjiro R. Okajima + * + * This program, aufs is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#ifndef __AUFS_TYPE_H__ +#define __AUFS_TYPE_H__ + +#include + +#define AUFS_VERSION "2-standalone.tree-29-20090406" + +/* todo? move this to linux-2.6.19/include/magic.h */ +#define AUFS_SUPER_MAGIC ('a' << 24 | 'u' << 16 | 'f' << 8 | 's') + +/* ---------------------------------------------------------------------- */ + +#ifdef CONFIG_AUFS_BRANCH_MAX_127 +/* some environments treat 'char' as 'unsigned char' by default */ +typedef signed char aufs_bindex_t; +#define AUFS_BRANCH_MAX 127 +#else +typedef short aufs_bindex_t; +#ifdef CONFIG_AUFS_BRANCH_MAX_511 +#define AUFS_BRANCH_MAX 511 +#elif defined(CONFIG_AUFS_BRANCH_MAX_1023) +#define AUFS_BRANCH_MAX 1023 +#elif defined(CONFIG_AUFS_BRANCH_MAX_32767) +#define AUFS_BRANCH_MAX 32767 +#endif +#endif + +#ifdef __KERNEL__ +#ifndef AUFS_BRANCH_MAX +#error unknown CONFIG_AUFS_BRANCH_MAX value +#endif +#endif /* __KERNEL__ */ + +/* ---------------------------------------------------------------------- */ + +#define AUFS_NAME "aufs" +#define AUFS_FSTYPE AUFS_NAME + +#define AUFS_ROOT_INO 2 +#define AUFS_FIRST_INO 11 + +#define AUFS_WH_PFX ".wh." +#define AUFS_WH_PFX_LEN ((int)sizeof(AUFS_WH_PFX) - 1) +#define AUFS_XINO_FNAME "." AUFS_NAME ".xino" +#define AUFS_XINO_DEFPATH "/tmp/" AUFS_XINO_FNAME +#define AUFS_XINO_TRUNC_INIT 64 /* blocks */ +#define AUFS_XINO_TRUNC_STEP 4 /* blocks */ +#define AUFS_DIRWH_DEF 3 +#define AUFS_RDCACHE_DEF 10 /* seconds */ +#define AUFS_WKQ_NAME AUFS_NAME "d" +#define AUFS_NWKQ_DEF 4 +#define AUFS_MFS_SECOND_DEF 30 /* seconds */ +#define AUFS_PLINK_WARN 100 /* number of plinks */ + +#define AUFS_DIROPQ_NAME AUFS_WH_PFX ".opq" /* whiteouted doubly */ +#define AUFS_WH_DIROPQ AUFS_WH_PFX AUFS_DIROPQ_NAME + +#define AUFS_BASE_NAME AUFS_WH_PFX AUFS_NAME +#define AUFS_PLINKDIR_NAME AUFS_WH_PFX "plnk" +#define AUFS_ORPHDIR_NAME AUFS_WH_PFX "orph" + +/* doubly whiteouted */ +#define AUFS_WH_BASE AUFS_WH_PFX AUFS_BASE_NAME +#define AUFS_WH_PLINKDIR AUFS_WH_PFX AUFS_PLINKDIR_NAME +#define AUFS_WH_ORPHDIR AUFS_WH_PFX AUFS_ORPHDIR_NAME + +/* branch permission */ +#define AUFS_BRPERM_RW "rw" +#define AUFS_BRPERM_RO "ro" +#define AUFS_BRPERM_RR "rr" +#define AUFS_BRPERM_WH "wh" +#define AUFS_BRPERM_NLWH "nolwh" +#define AUFS_BRPERM_ROWH AUFS_BRPERM_RO "+" AUFS_BRPERM_WH +#define AUFS_BRPERM_RRWH AUFS_BRPERM_RR "+" AUFS_BRPERM_WH +#define AUFS_BRPERM_RWNLWH AUFS_BRPERM_RW "+" AUFS_BRPERM_NLWH + +/* ---------------------------------------------------------------------- */ + +/* ioctl */ +enum { + AuCtl_PLINK_MAINT, + AuCtl_PLINK_CLEAN +}; + +#define AuCtlType 'A' +#define AUFS_CTL_PLINK_MAINT _IO(AuCtlType, AuCtl_PLINK_MAINT) +#define AUFS_CTL_PLINK_CLEAN _IO(AuCtlType, AuCtl_PLINK_CLEAN) + +#endif /* __AUFS_TYPE_H__ */ aufs2 standalone patch for linux-2.6.29 diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 55b3145..4af299a 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -459,9 +459,6 @@ out_lock: unlock_dir(lower_dir_dentry); dput(lower_new_dentry); dput(lower_old_dentry); - d_drop(lower_old_dentry); - d_drop(new_dentry); - d_drop(old_dentry); return rc; } @@ -473,7 +470,10 @@ static int ecryptfs_unlink(struct inode *dir, struct dentry *dentry) struct dentry *lower_dir_dentry; lower_dir_dentry = lock_parent(lower_dentry); + dget(lower_dentry); + atomic_inc(&lower_dentry->d_inode->i_count); rc = vfs_unlink(lower_dir_inode, lower_dentry); + dput(lower_dentry); if (rc) { printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc); goto out_unlock; @@ -484,6 +484,7 @@ static int ecryptfs_unlink(struct inode *dir, struct dentry *dentry) dentry->d_inode->i_ctime = dir->i_ctime; d_drop(dentry); out_unlock: + iput(lower_dentry->d_inode); unlock_dir(lower_dir_dentry); return rc; } @@ -569,8 +570,12 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry) fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode); dir->i_nlink = lower_dir_dentry->d_inode->i_nlink; unlock_dir(lower_dir_dentry); - if (!rc) + if (!rc) { + struct inode *inode = dentry->d_inode; + inode->i_nlink = ecryptfs_inode_to_lower(inode)->i_nlink; + inode->i_ctime = dir->i_ctime; d_drop(dentry); + } dput(dentry); return rc; } aufs2 standalone patch for linux-2.6.29 diff --git a/fs/Kconfig b/fs/Kconfig index 93945dd..75156dd 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -222,6 +222,7 @@ source "fs/qnx4/Kconfig" source "fs/romfs/Kconfig" source "fs/sysv/Kconfig" source "fs/ufs/Kconfig" +source "fs/aufs/Kconfig" endif # MISC_FILESYSTEMS diff --git a/fs/Makefile b/fs/Makefile index dc20db3..a4e9a65 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -124,3 +124,4 @@ obj-$(CONFIG_DEBUG_FS) += debugfs/ obj-$(CONFIG_OCFS2_FS) += ocfs2/ obj-$(CONFIG_BTRFS_FS) += btrfs/ obj-$(CONFIG_GFS2_FS) += gfs2/ +obj-$(CONFIG_AUFS_FS) += aufs/ diff --git a/fs/namei.c b/fs/namei.c index bbc15c2..e54d1b2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -335,6 +335,7 @@ int deny_write_access(struct file * file) return 0; } +EXPORT_SYMBOL(deny_write_access); /** * path_get - get a reference to a path @@ -1196,7 +1197,7 @@ out: * needs parent already locked. Doesn't follow mounts. * SMP-safe. */ -static struct dentry *lookup_hash(struct nameidata *nd) +struct dentry *lookup_hash(struct nameidata *nd) { int err; @@ -1205,8 +1206,9 @@ static struct dentry *lookup_hash(struct nameidata *nd) return ERR_PTR(err); return __lookup_hash(&nd->last, nd->path.dentry, nd); } +EXPORT_SYMBOL(lookup_hash); -static int __lookup_one_len(const char *name, struct qstr *this, +int __lookup_one_len(const char *name, struct qstr *this, struct dentry *base, int len) { unsigned long hash; @@ -1227,6 +1229,7 @@ static int __lookup_one_len(const char *name, struct qstr *this, this->hash = end_name_hash(hash); return 0; } +EXPORT_SYMBOL(__lookup_one_len); /** * lookup_one_len - filesystem helper to lookup single pathname component diff --git a/fs/namespace.c b/fs/namespace.c index 06f8e63..b002ec3 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -37,6 +37,9 @@ /* spinlock for vfsmount related operations, inplace of dcache_lock */ __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfsmount_lock); +#ifdef CONFIG_AUFS_EXPORT +EXPORT_SYMBOL(vfsmount_lock); +#endif static int event; static DEFINE_IDA(mnt_id_ida); diff --git a/fs/open.c b/fs/open.c index a3a78ce..d2f61c7 100644 --- a/fs/open.c +++ b/fs/open.c @@ -220,6 +220,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs, mutex_unlock(&dentry->d_inode->i_mutex); return err; } +EXPORT_SYMBOL(do_truncate); static long do_sys_truncate(const char __user *pathname, loff_t length) { diff --git a/fs/super.c b/fs/super.c index 6ce5014..af66b69 100644 --- a/fs/super.c +++ b/fs/super.c @@ -287,6 +287,7 @@ int fsync_super(struct super_block *sb) __fsync_super(sb); return sync_blockdev(sb->s_bdev); } +EXPORT_SYMBOL(fsync_super); /** * generic_shutdown_super - common helper for ->kill_sb() diff --git a/include/linux/Kbuild b/include/linux/Kbuild index 106c3ba..d0c7262 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -34,6 +34,7 @@ header-y += atmppp.h header-y += atmsap.h header-y += atmsvc.h header-y += atm_zatm.h +header-y += aufs_type.h header-y += auto_fs4.h header-y += ax25.h header-y += b1lli.h diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 23bf02f..49e5b47 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -58,7 +58,7 @@ enum lock_usage_bit #define LOCKF_USED_IN_IRQ_READ \ (LOCKF_USED_IN_HARDIRQ_READ | LOCKF_USED_IN_SOFTIRQ_READ) -#define MAX_LOCKDEP_SUBCLASSES 8UL +#define MAX_LOCKDEP_SUBCLASSES 12UL /* * Lock-classes are keyed via unique addresses, by embedding the diff --git a/include/linux/namei.h b/include/linux/namei.h index fc2e035..182d43b 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -75,6 +75,9 @@ extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry extern struct file *nameidata_to_filp(struct nameidata *nd, int flags); extern void release_open_intent(struct nameidata *); +extern struct dentry *lookup_hash(struct nameidata *nd); +extern int __lookup_one_len(const char *name, struct qstr *this, + struct dentry *base, int len); extern struct dentry *lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_noperm(const char *, struct dentry *); diff --git a/security/device_cgroup.c b/security/device_cgroup.c index 3aacd0f..b900dc3 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -507,6 +507,7 @@ acc_check: return -EPERM; } +EXPORT_SYMBOL(devcgroup_inode_permission); int devcgroup_inode_mknod(int mode, dev_t dev) { diff --git a/security/security.c b/security/security.c index c3586c0..8b0495f 100644 --- a/security/security.c +++ b/security/security.c @@ -389,6 +389,7 @@ int security_path_mkdir(struct path *path, struct dentry *dentry, int mode) return 0; return security_ops->path_mkdir(path, dentry, mode); } +EXPORT_SYMBOL(security_path_mkdir); int security_path_rmdir(struct path *path, struct dentry *dentry) { @@ -396,6 +397,7 @@ int security_path_rmdir(struct path *path, struct dentry *dentry) return 0; return security_ops->path_rmdir(path, dentry); } +EXPORT_SYMBOL(security_path_rmdir); int security_path_unlink(struct path *path, struct dentry *dentry) { @@ -403,6 +405,7 @@ int security_path_unlink(struct path *path, struct dentry *dentry) return 0; return security_ops->path_unlink(path, dentry); } +EXPORT_SYMBOL(security_path_unlink); int security_path_symlink(struct path *path, struct dentry *dentry, const char *old_name) @@ -411,6 +414,7 @@ int security_path_symlink(struct path *path, struct dentry *dentry, return 0; return security_ops->path_symlink(path, dentry, old_name); } +EXPORT_SYMBOL(security_path_symlink); int security_path_link(struct dentry *old_dentry, struct path *new_dir, struct dentry *new_dentry) @@ -419,6 +423,7 @@ int security_path_link(struct dentry *old_dentry, struct path *new_dir, return 0; return security_ops->path_link(old_dentry, new_dir, new_dentry); } +EXPORT_SYMBOL(security_path_link); int security_path_rename(struct path *old_dir, struct dentry *old_dentry, struct path *new_dir, struct dentry *new_dentry) @@ -429,6 +434,7 @@ int security_path_rename(struct path *old_dir, struct dentry *old_dentry, return security_ops->path_rename(old_dir, old_dentry, new_dir, new_dentry); } +EXPORT_SYMBOL(security_path_rename); int security_path_truncate(struct path *path, loff_t length, unsigned int time_attrs) @@ -437,6 +443,7 @@ int security_path_truncate(struct path *path, loff_t length, return 0; return security_ops->path_truncate(path, length, time_attrs); } +EXPORT_SYMBOL(security_path_truncate); #endif int security_inode_create(struct inode *dir, struct dentry *dentry, int mode) @@ -506,6 +513,7 @@ int security_inode_readlink(struct dentry *dentry) return 0; return security_ops->inode_readlink(dentry); } +EXPORT_SYMBOL(security_inode_readlink); int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd) {