When uid 0 maps host uid 0 into the child userns newer kernels require
CAP_SETFCAP be retained as this allows the caller to create fscaps that
are valid in the ancestor userns. This was a security issue (in very
rare circumstances). So whenever host uid 0 is mapped, retain
CAP_SETFCAP if the caller had it.
Userspace won't need to set CAP_SETFCAP on newuidmap as this is really
only a scenario that real root should be doing which always has
CAP_SETFCAP. And if they don't then they are in a locked-down userns.
(LXC sometimes maps host uid 0 during chown operations in a helper
userns but will not rely on newuidmap for that. But we don't want to
risk regressing callers that want to rely on this behavior.)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Search the SELinux selabel database for the file type to be created.
Not specifying the file mode can cause an incorrect file context to be
returned.
Also prepare contexts in commonio_close() for the generic database
filename, not with the backup suffix appended, to ensure the desired
file context after the final rename.
Closes: #322
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: James Carter <jwcart2@gmail.com>
Closes#154
When starting any operation to do with subuid delegation, check
nsswitch for a module to use. If none is specified, then use
the traditional /etc/subuid and /etc/subgid files.
Currently only one module is supported, and there is no fallback
to the files on errors. Several possibilities could be considered:
1. in case of connection error, fall back to files
2. in case of unknown user, also fall back to files
etc...
When non-files nss module is used, functions to edit the range
are not supported. It may make sense to support it, but it also
may make sense to require another tool to be used.
libsubordinateio also uses the nss_ helpers. This is how for instance
lxc could easily be converted to supporting nsswitch.
Add a set of test cases, including a dummy libsubid_zzz module. This
hardcodes values such that:
'ubuntu' gets 200000 - 300000
'user1' gets 100000 - 165536
'error' emulates an nss module error
'unknown' emulates a user unknown to the nss module
'conn' emulates a connection error ot the nss module
Changes to libsubid:
Change the list_owner_ranges api: return a count instead of making the array
null terminated.
This is a breaking change, so bump the libsubid abi major number.
Rename free_subuid_range and free_subgid_range to ungrant_subuid_range,
because otherwise it's confusing with free_subid_ranges which frees
memory.
Run libsubid tests in jenkins
Switch argument order in find_subid_owners
Move the db locking into subordinateio.c
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Issue #297 reported seeing
*** Warning: Linking the shared library libsubid.la against the
*** static library ../libmisc/libmisc.a is not portable!
which commit b5fb1b38ee was supposed
to fix. But a few commits later it's back. So try to fix it
in the way the bug reporter suggested. This broke builds some
other ways, namely a few missing library specifications, so add
those.
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Do not include <sys/prctl.h> we don't have <sys/capability.h>, we don't
need prctl in that case anyway.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
In case there is a regular user with a process running on a system
with uid falling into a namespaced uid range of another user.
The user with the colliding namespaced uid range will not be
allowed to be deleted without forcing the action with -f.
The user_busy() is adjusted to check whether the suspected process
is really a namespaced process in a different namespace.
With this, it is possible for Linux distributors to store their
supplied default configuration files somewhere below /usr, while
/etc only contains the changes made by the user. The new option
--enable-vendordir defines where the shadow suite should additional
look for login.defs if this file is not in /etc.
libeconf is a key/value configuration file reading library, which
handles the split of configuration files in different locations
and merges them transparently for the application.
new switch added to useradd command, --btrfs-subvolume-home. When
specified *and* the filesystem is detected as btrfs, it will create a
subvolume for user's home instead of a plain directory. This is done via
`btrfs subvolume` command. Specifying the new switch while trying to
create home on non-btrfs will result in an error.
userdel -r will handle and remove this subvolume transparently via
`btrfs subvolume` command. Previosuly this failed as you can't rmdir a
subvolume.
usermod, when moving user's home across devices, will detect if the home
is a subvolume and issue an error messages instead of copying it. Moving
user's home (as subvolume) on same btrfs works transparently.
From <https://github.com/shadow-maint/shadow/pull/71>:
```
The third field in the /etc/shadow file (sp_lstchg) contains the date of
the last password change expressed as the number of days since Jan 1, 1970.
As this is a relative time, creating a user today will result in:
username:17238:0:99999:7:::
whilst creating the same user tomorrow will result in:
username:17239:0:99999:7:::
This has an impact for the Reproducible Builds[0] project where we aim to
be independent of as many elements the build environment as possible,
including the current date.
This patch changes the behaviour to use the SOURCE_DATE_EPOCH[1]
environment variable (instead of Jan 1, 1970) if valid.
```
This updated PR adds some missing calls to gettime (). This was originally
filed by Johannes Schauer in Debian as #917773 [2].
[0] https://reproducible-builds.org/
[1] https://reproducible-builds.org/specs/source-date-epoch/
[2] https://bugs.debian.org/917773
simplify the condition for setting the euid of the process. Now it is
always set when we are running as root, the issue was introduced with
the commit 52c081b02c
Changelog: 2018-11-24 - seh - enforce that euid only gets set to ruid if
it currently == 0 (i.e. really was setuid-*root*).
Closes: https://github.com/genuinetools/img/issues/191
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Serge Hallyn <shallyn@cisco.com>
Commit 1ecca8439d ("new[ug]idmap: not require CAP_SYS_ADMIN in the parent userNS")
does contain a wrong commit message, is lacking an explanation of the
issue, misses some simplifications and hardening features. This commit
tries to rectify this.
In (crazy) environment where all capabilities are dropped from the
capability bounding set apart from CAP_SET{G,U}ID setuid- and
fscaps-based new{g,u}idmap binaries behave differently when writing
complex mappings for an unprivileged user:
1. newuidmap is setuid
unshare -U sleep infinity &
newuidmap $? 0 100000 65536
First file_ns_capable(file, ns, CAP_SYS_ADMIN) is hit. This calls into
cap_capable() and hits the loop
for (;;) {
/* Do we have the necessary capabilities? */
if (ns == cred->user_ns)
return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
/*
* If we're already at a lower level than we're looking for,
* we're done searching.
*/
if (ns->level <= cred->user_ns->level)
return -EPERM;
/*
* The owner of the user namespace in the parent of the
* user namespace has all caps.
*/
if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
return 0;
/*
* If you have a capability in a parent user ns, then you have
* it over all children user namespaces as well.
*/
ns = ns->parent;
}
The first check fails and falls through to the end of the loop and
retrieves the parent user namespace and checks whether CAP_SYS_ADMIN is
available there which isn't.
2. newuidmap has CAP_SETUID as fscaps set
unshare -U sleep infinity &
newuidmap $? 0 100000 65536
The first file_ns_capable() check for CAP_SYS_ADMIN is passed since the
euid has not been changed:
if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
return 0;
Now new_idmap_permitted() is hit which calls ns_capable(ns->parent,
CAP_SET{G,U}ID). This check passes since CAP_SET{G,U}ID is available in
the parent user namespace.
Now file_ns_capable(file, ns->parent, CAP_SETUID) is hit and the
cap_capable() loop (see above) is entered again. This passes
if (ns == cred->user_ns)
return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
since CAP_SET{G,U}ID is available in the parent user namespace. Now the
mapping can be written.
There is no need for this descrepancy between setuid and fscaps based
new{g,u}idmap binaries. The solution is to do a
seteuid() back to the unprivileged uid and PR_SET_KEEPCAPS to keep
CAP_SET{G,U}ID. The seteuid() will cause the
file_ns_capable(file, ns, CAP_SYS_ADMIN) check to pass and the
PR_SET_KEEPCAPS for CAP_SET{G,U}ID will cause the CAP_SET{G,U}ID to
pass.
Fixes: 1ecca8439d ("new[ug]idmap: not require CAP_SYS_ADMIN in the parent userNS")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
if the euid!=owner of the userns, the kernel returns EPERM when trying
to write the uidmap and there is no CAP_SYS_ADMIN in the parent
namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This allows shadow-utils to build on systems like Adélie, which have no
<utmp.h> header or `struct utmp`. We use a <utmpx.h>-based daemon,
utmps[1], which uses `struct utmpx` only.
Tested both `login` and `logoutd` with utmps and both work correctly.
[1]: http://skarnet.org/software/utmps/
The third field in the /etc/shadow file (sp_lstchg) contains the date of
the last password change expressed as the number of days since Jan 1, 1970.
As this is a relative time, creating a user today will result in:
username:17238:0:99999:7:::
whilst creating the same user tomorrow will result in:
username:17239:0:99999:7:::
This has an impact for the Reproducible Builds[0] project where we aim to
be independent of as many elements the build environment as possible,
including the current date.
This patch changes the behaviour to use the SOURCE_DATE_EPOCH[1]
environment variable (instead of Jan 1, 1970) if valid.
[0] https://reproducible-builds.org/
[1] https://reproducible-builds.org/specs/source-date-epoch/
Signed-off-by: Chris Lamb <lamby@debian.org>
Previously, the allocation was optimized for an outdated
deployment style (that of /etc/group alongside nss_db). The issue
here is that this results in extremely poor performance when using
SSSD, Winbind or nss_ldap.
There were actually two serious bugs here that have been addressed:
1) Running getgrent() loops won't work in most SSSD or Winbind
environments, as full group enumeration is disabled by default.
This could easily result in auto-allocating a group that was
already in use. (This might result in a security issue as well, if
the shared GID is a privileged group).
2) For system groups, the loop was always iterating through the
complete SYS_GID_MIN->SYS_GID_MAX range. On SSSD and Winbind, this
means hundreds of round-trips to LDAP (unless the GIDs were
specifically configured to be ignored by the SSSD or winbindd).
To a user with a slow connection to their LDAP server, this would
appear as if groupadd -r was hung. (Though it would eventually
complete).
This patch changes the algorithm to be more favorable for LDAP
environments, at the expense of some performance when using nss_db.
Given that the DB is a local service, this should have a negligible
effect from a user's perspective.
With the new algorithm, we simply first iterate through all entries
in the local database with gr_next(), recording the IDs that are in
use. We then start from the highest presumed-available entry and
call getgrgid() to see if it is available. We continue this until
we come to the first unused GID. We then select that and return it.
If we make it through all the remaining IDs without finding a free
one, we start over from the beginning of the range and try to find
room in one of the gaps in the range.
The patch was originally written by Stephen Gallagher and applied
identically also to the user allocation by Tomáš Mráz.
Signed-off-by: Serge Hallyn <serge@hallyn.com>