opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-22 14:49:36 -04:00

Author	SHA1	Message	Date
Mateusz Guzik	61852185ba	locks: change sleep_cnt and spin_cnt types to u_int Both variables are uint64_t, but they only count spins or sleeps. All reasonable values which we can get here comfortably hit in 32-bit range. Suggested by: kib MFC after: 1 week	2016-07-31 12:11:55 +00:00
Mateusz Guzik	e0c45af904	sx: increment spin_cnt before cpu_spinwait in xlock The change is a no-op only done for consistency with the rest of the file.	2016-07-30 22:23:31 +00:00
Mateusz Guzik	7a54be1870	rwlock: s/READER/WRITER/ in wlock lockstat annotation	2016-07-30 22:21:48 +00:00
Konstantin Belousov	50c22263bb	Cache getbintime(9) answer in timehands, similarly to getnanotime(9) and getmicrotime(9). Suggested and reviewed by: bde (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2016-07-30 09:25:57 +00:00
John Baldwin	e2325d8285	Don't treat NOCPU as a valid CPU to CPU_ISSET. If a thread is created bound to a cpuset it might already be bound before it's very first timeslice, and td_lastcpu will be NOCPU in that case. MFC after: 1 week	2016-07-29 20:19:14 +00:00
John Baldwin	005ce8e4e6	Fix locking issues with aio_fsync(). - Use correct lock in aio_cancel_sync when dequeueing job. - Add _locked variants of aio_set/clear_cancel_function and use those to avoid lock recursion when adding and removing fsync jobs to the per-process sync queue. - While here, add a basic test for aio_fsync(). PR: 211390 Reported by: Randy Westlund <rwestlun@gmail.com> MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7339	2016-07-29 18:26:15 +00:00
Brooks Davis	40018b91dd	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7353	2016-07-28 21:29:04 +00:00
Ed Schouten	5590eb985e	Regenerate system call table for r303435.	2016-07-28 12:22:34 +00:00
Ed Schouten	d9c4cd2fbc	Change the return type of msgrcv() to ssize_t as required by POSIX. It looks like the msgrcv() system call is already written in such a way that the size is internally computed as a size_t and written into all of td_retval[0]. This means that it is effectively already returning ssize_t. It's just that the userspace prototype doesn't match up.	2016-07-28 12:22:01 +00:00
Konstantin Belousov	2d19b736ed	Rewrite subr_sleepqueue.c use of callouts to not depend on the specifics of callout KPI. Esp., do not depend on the exact interface of callout_stop(9) return values. The main change is that instead of requiring precise callouts, code maintains absolute time to wake up. Callouts now should ensure that a wake occurs at the requested moment, but we can tolerate both run-away callout, and callout_stop(9) lying about running callout either way. As consequence, it removes the constant source of the bugs where sleepq_check_timeout() causes uninterruptible thread state where the thread is detached from CPU, see e.g. r234952 and r296320. Patch also removes dual meaning of the TDF_TIMEOUT flag, making code (IMO much) simpler to reason about. Tested by: pho Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 09:09:55 +00:00
Konstantin Belousov	a9e182e895	Extract the calculation of the callout fire time into the new function callout_when(9). See the man page update for the description of the intended use. Tested by: pho Reviewed by: jhb, bjk (man page updates) Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 08:57:01 +00:00
Konstantin Belousov	b7a25e63b6	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256	2016-07-28 08:41:13 +00:00
Stephen J. Kiernan	4ac21b4f09	Prepare for network stack as a module - Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the INET and INET6-specific code from the rest of the prot code (It is only used by the network stack, so it makes sense for it to live with the other network stack code.) - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to cr_canseeothergids, make them non-static, and add prototypes (so they can be seen/called by in_prot.c functions.) - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an unused variable. Reviewed by: gnn, jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2901	2016-07-27 20:34:09 +00:00
John Baldwin	b9a53e161b	Adjust tests in fsync job scheduling loop to reduce indentation.	2016-07-27 19:31:25 +00:00
Ed Maste	18c23dffac	ANSIfy kern_proc.c and delete register keyword Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6478	2016-07-27 14:27:08 +00:00
Konstantin Belousov	1822421c0b	Remove Giant from settime(), tc_setclock_mtx guards tc_windup() calls, and there is no other issues with parallel settime(). Remove spl() vestiges there as well. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:54:24 +00:00
Konstantin Belousov	5760b029ee	Prevent parallel tc_windup() calls, both parallel top-level calls from setclock() and from simultaneous top-level and interrupt. For this, tc_windup() is protected with a tc_setclock_mtx spinlock, in the try mode when called from hardclock interrupt. If spinlock cannot be obtained without spinning from the interrupt context, this means that top-level executes tc_windup() on other core and our try may be avoided. The boottimebin and boottime variables should be adjusted from tc_windup(). To be correct, they must be part of the timehands and read using lockless protocol. Remove the globals and reimplement the getboottime(9)/getboottimebin(9) KPI using the timehands read protocol. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed wit: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:49:41 +00:00
Konstantin Belousov	4493f659e5	Fix a bug in r302252. Change ntpadj_lock to spinlock always, and rename stuff removing ADJ/adj from the names. ntp_update_second() requires ntp_lock and is called from the tc_windup(), so ntp_lock must be a spinlock. Add missed lock to ntp_update_second(). Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:40:06 +00:00
Konstantin Belousov	21547fc7ca	Reduce the resettodr_lock scope to only CLOCK_SETTIME() call. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:34:25 +00:00
Konstantin Belousov	4d29106e55	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:33:33 +00:00
Konstantin Belousov	a83c016f00	Reduce number of timehands to just two. This is useful because consumers can now be only one tc_windup() call late. Use C99 initialization. Tested by: pho (as part of the whole patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:27:52 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Stephen J. Kiernan	cc37baea09	Add the NUM_CORE_FILES kernel config option which specifies the limit for the number of core files allowed by a particular process when using the %I core file name pattern. Sanity check at compile time to ensure the value is within the valid range of 0-10. Reviewed by: jtl, sjg Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6812	2016-07-27 03:21:02 +00:00
Ed Schouten	f63cd251b2	Add shmatt_t. It looks like our "struct shmid_ds::shm_nattch" deviates from the standard in the sense that it is a signed integer, whereas POSIX requires that it is unsigned, having a special type shmatt_t. Patch up our native and 32-bit copies to use a new shmatt_t that is an unsigned integer. As it's unsigned, we can relax the comparisons that are performed on it. Leave the Linux, iBCS2, etc. copies of the structure alone. Reviewed by: ngie Differential Revision: https://reviews.freebsd.org/D6655	2016-07-26 17:23:49 +00:00
Conrad Meyer	af326ace9d	devfs: Move most ioctl logic down to vnode layer Devfs' file layer ioctl is now just a thin shim around the vnode layer. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7286	2016-07-25 16:28:02 +00:00
Konstantin Belousov	90b581f2cc	Implement mtx_trylock_spin(9). Discussed with: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7192	2016-07-23 05:30:55 +00:00
John Baldwin	9c20dc9963	Add more documentation regarding unsafe AIO requests. The asynchronous I/O changes made previously result in different behavior out of the box. Previously all AIO requests failed with ENOSYS / SIGSYS unless aio.ko was explicitly loaded. Now, some AIO requests complete and others ("unsafe" requests) fail with EOPNOTSUPP. Reword the introductory paragraph in aio(4) to add a general description of AIO before describing the vfs.aio.enable_unsafe sysctl. Remove the ENOSYS error description from aio_fsync(2), aio_read(2), and aio_write(2) and replace it with a description of EOPNOTSUPP. Remove the ENOSYS error description from aio_mlock(2). Log a message to the system log the first time a process requests an "unsafe" AIO request that fails with EOPNOTSUPP. This is modeled on the log message used for processes using the legacy pty devices. Reviewed by: kib (earlier version) MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7151	2016-07-21 22:49:47 +00:00
Konstantin Belousov	492fe1b774	Hide counted_warning(9) under #ifdef _KERNEL braces, to allow building subr_prf.c in userspace for libsbuf. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-21 17:59:30 +00:00
Konstantin Belousov	9fe297bbdc	Declare aio requests on files from local filesystems safe. Two notes: - I allow AIO on reclaimed vnodes, since it is deterministically terminated fast. - devfs mounts are marked as MNT_LOCAL, but device vnodes have type VCHR, so the slow device io is not allowed. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7273	2016-07-21 17:07:06 +00:00
Konstantin Belousov	9837947b07	Provide counter_warning(9) KPI which allows to issue limited number of warnings for some kernel events, mostly intended for the use of obsoleted or otherwise undersired interfaces. This is an abstracted and race-expelled code from compat pty driver. Requested and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7270	2016-07-21 16:34:56 +00:00
Conrad Meyer	1005d8aff4	imgact_elf: Rename the segment iterator to match reality The each_writable_segment routine evaluates segments on a slightly little more nuanced metric than simply "writable" or not. Rename the function to more closely match its behavior (each_dumpable_segment). Suggested by: jhb Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:51:33 +00:00
Conrad Meyer	f3325003d9	ANSI-fy imgact_elf.c Sponsored by: EMC / Isilon Storage Division	2016-07-20 22:46:56 +00:00
Conrad Meyer	07f825e871	Fix DEBUG build on 64-bit arch after r303099 Reported by: Larry Rosenman <ler at lerctr.org>	2016-07-20 18:11:22 +00:00
Conrad Meyer	c17b0bd2a6	Extend ELF coredump to support more than 65535 segments The ELF e_phnum field is only 16 bits wide. To support more than 65535 segments (program headers), Sun's "Linker and Libraries Guide" table 7-7 (or 12-7, depending on document version) prescribes a special first section header where sh_info represents the real number of program headers. Test code to follow, when it is ready. Reference: http://docs.oracle.com/cd/E18752_01/pdf/817-1984.pdf Reviewed by: emaste, markj Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7255	2016-07-20 16:59:36 +00:00
Gleb Smirnoff	9f3391243b	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky Tested by: Larry Rosenman <ler lerctr.org> PR: 210884	2016-07-20 16:48:25 +00:00
Gleb Smirnoff	47e4280922	Revert r303037. It re-introduces the panic with TCP timers. Agreed by: rrs, re (gjb)	2016-07-20 16:44:22 +00:00
Randall Stewart	3d84a18803	This reverts out Gleb's changes and adds three small fixes that I think closes up the races Gleb was looking for. This is running quite nicely in Netflix and now no longer causes TCP-tcb leaks. Differential Revision: 7135	2016-07-19 18:31:19 +00:00
John Baldwin	ccb83afd81	Include process IDs in core dumps. When threads were added to the kernel, the pr_pid member of the NT_PRSTATUS note was repurposed to store LWP IDs instead of process IDs. However, the process ID was no longer recorded in core dumps. This change adds a pr_pid field to prpsinfo (NT_PRSINFO). Rather than bumping the prpsinfo version number, note parsers can use the note's payload size to determine if pr_pid is present. Reviewed by: kib, emaste (older version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7117	2016-07-18 15:14:23 +00:00
John Baldwin	fc4f075a1a	Add PTRACE_VFORK to trace vfork events. First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when the new child was created via vfork() rather than fork(). Second, a new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK event mask. This new stop is reported after the vfork parent resumes due to the child calling exit or exec. Debuggers can use this stop to reinsert breakpoints in the vfork parent process before it resumes. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7045	2016-07-18 14:53:55 +00:00
Konstantin Belousov	77d6809483	The assertion re-added in r302614 was triggered when stopping signal is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-18 10:53:47 +00:00
Gleb Smirnoff	809a9d1353	Revert the last commit. It must get more review and testing first.	2016-07-18 09:29:08 +00:00
Gleb Smirnoff	ef58c6a7a3	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky PR: 210884	2016-07-18 09:26:06 +00:00
Konstantin Belousov	56d48d6dcb	Fix another bug after r302350. Reported and tested by: pho PR: 210884 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-07-18 04:30:34 +00:00
Konstantin Belousov	86f1146329	Another issue reported on http://seclists.org/oss-sec/2016/q3/68 is that struct kevent member ident has uintptr_t type, which is silently truncated to int in the call to fget(). Explicitely check for the valid range. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-16 13:24:58 +00:00
Konstantin Belousov	f470cca578	In ptrace_vm_entry(), do not call vmspace_free() while owning a vm object lock. The vmspace_free() operations might need to lock map, object etc on last dereference. Postpone the free until object's inspection is done. Reported and tested by: will Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-15 23:26:33 +00:00
John Baldwin	8d570f64aa	Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044	2016-07-15 15:32:09 +00:00
Gleb Smirnoff	2138e263cb	Fix regression introduced by r302350. The change of return value for a callout that wasn't scheduled at all was unintentional and yielded in several panics. PR: 210884	2016-07-15 09:28:32 +00:00
Konstantin Belousov	d7e8cfd63d	Do not allow creation of char or block special nodes with VNOVAL dev_t. As was reported on http://seclists.org/oss-sec/2016/q3/68, tmpfs code contains assertion that rdev != VNOVAL. On FreeBSD, there is no other consequences except triggering the assert. To be compatible with systems where device nodes have some significance, reject mknod(2) call with dev == VNOVAL at the syscall level. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-15 09:23:18 +00:00
John Baldwin	c77547d2f9	Include command line arguments in core dump process info. Fill in pr_psargs in the NT_PRSINFO ELF core dump note with command line arguments. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7116	2016-07-14 23:20:05 +00:00
Mark Johnston	0649caae32	Let DDB's buf printer handle NULL pointers in the buf page array. A buf's b_pages and b_npages fields may be inconsistent after a panic. For instance, vfs_vmio_invalidate() sets b_npages to zero only after all pages are unwired and their page array entries are cleared. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-07-14 18:49:05 +00:00
Eric Badger	fdb6320d45	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
Konstantin Belousov	de56aee0bf	Trace timeval parameters to the getitimer(2) and setitimer(2) syscalls. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7158	2016-07-13 14:37:58 +00:00
Konstantin Belousov	8f01bee46b	Revive the check, disabled in r197963. Despite the implication (process has pending signals -> the current thread marked for AST and has TDF_NEEDSIGCHK set) is not true due to other thread might manipulate its signal blocking mask, it should still hold for the single-threaded processes. Enable check for the condition for single-threaded case, and replicate it from userret() to ast() as well, where we check that ast indeed has no signal to deliver. Note that the check is under DIAGNOSTIC, it is not enabled for INVARIANTS but !DIAGNOSTIC since it imposes too heavy-weight locking for day-to-day used debugging kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-12 03:53:15 +00:00
Konstantin Belousov	4f4c35bb38	Add assert to complement r302328. AST must not execute with TDF_SBDRY or TDF_SEINTR/TDF_SERESTART thread flags set, which is asserted in userret(). As the consequence, -1 return from cursig() must not be possible. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-12 03:52:05 +00:00
Nathan Whitehorn	8c636a11dc	Remove assumptions in MI code that the BSP is CPU 0. MFC after: 2 weeks	2016-07-11 21:25:28 +00:00
Konstantin Belousov	725496ced5	Fix grammar. Submitted by: alc MFC after: 2 weeks	2016-07-11 17:04:22 +00:00
Konstantin Belousov	19efd8a5a8	In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called, if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate(). The vnode_destroy_object(), when calling into vm_object_terminate(), must be able to flush buffers. BO_DEAD purpose is to quickly destroy buffers on write when the underlying vnode is not operable any more (one example is the devfs node after geom is gone). Setting BO_DEAD for reclaiming vnode before object is terminated is premature, and results in unability to flush buffers with live SU dependencies from vinvalbuf() in vm_object_terminate(). Reported by: David Cross <dcrosstech@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-11 14:19:09 +00:00
Robert Watson	5fa69ff015	In process-descriptor close(2) and fstat(2), audit target process information. pgkill(2) already audits target process ID. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 14:17:36 +00:00
Robert Watson	e5ec733909	Do allow auditing of read(2) and write(2) system calls, by assigning those system calls audit event identifiers AUE_READ and AUE_WRITE. While auditing file-descriptor I/O is not required by the Common Criteria, in practice this proves useful for both live and forensic analysis. NB: freebsd32 already assigns AUE_READ and AUE_WRITE to read(2) and write(2). MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 13:42:33 +00:00
Robert Watson	8ec75c0fc3	Audit the file-descriptor number argument for openat(2). Remove a comment about the desirability of auditing the number, as it was in fact in the wrong place (in the common path for open(2) and openat(2), and only the latter accepts a file-descriptor argument). Where other ABIs support openat(2), it may be necessary to do additional argument auditing as it is not performed in kern_openat(9). MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 09:50:21 +00:00
Robert Watson	51d1f69069	Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2). This auditing is not required by the Common Criteria (and hence was not being performed), but is valuable in both contemporary live analysis and forensic use cases. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 08:04:02 +00:00
Edward Tomasz Napierala	debc480e03	Add new unmount(2) flag, MNT_NONBUSY, to check whether there are any open vnodes before proceeding. Make autounmound(8) use this flag. Without it, even an unsuccessfull unmount causes filesystem flush, which interferes with normal operation. Reviewed by: kib@ Approved by: re (gjb@) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7047	2016-07-07 09:03:57 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Gleb Smirnoff	d153eeee97	The paradigm of a callout is that it has three consequent states: not scheduled -> scheduled -> running -> not scheduled. The API and the manual page assume that, some comments in the code assume that, and looks like some contributors to the code also did. The problem is that this paradigm isn't true. A callout can be scheduled and running at the same time, which makes API description ambigouous. In such case callout_stop() family of functions/macros should return 1 and 0 at the same time, since it successfully unscheduled future callout but the current one is running. Before this change we returned 1 in such a case, with an exception that if running callout was migrating we returned 0, unless CS_MIGRBLOCK was specified. With this change, we now return 0 in case if future callout was unscheduled, but another one is still in action, indicating to API users that resources are not yet safe to be freed. However, the sleepqueue code relies on getting 1 return code in that case, and there already was CS_MIGRBLOCK flag, that covered one of the edge cases. In the new return path we will also use this flag, to keep sleepqueue safe. Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to migration edge case, rename it to CS_EXECUTING. This change fixes panics on a high loaded TCP server. Reviewed by: jch, hselasky, rrs, kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7042	2016-07-05 18:47:17 +00:00
Gleb Smirnoff	a0d20ecb94	Compile in the kassert_panic() function with INVARIANT_SUPPORT option, not INVARIANTS. The function is required if we want to load in a module that is compiled with INVARIANTS. Reviewed by: jhb Approved by: re (gjb)	2016-07-05 18:34:34 +00:00
Mark Johnston	f61d6c5a5b	Ensure that spinlock sections are balanced even after a panic. vpanic() uses spinlock_enter() to disable interrupts before dumping core. However, when the scheduler is stopped and INVARIANTS is not configured, thread_lock() does not acquire a spinlock section, while thread_unlock() releases one. This can result in interrupts staying enabled while the kernel dumps core, complicating post-mortem analysis of the crash. Approved by: re (gjb) MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-07-05 17:59:04 +00:00
Robert Watson	971711fb7c	Call audit hooks to capture vnode attributes for three file-descriptor method implementations: fstat(2), close(2), and poll(2). This change synchronises auditing here with similar auditing for VFS-specific system calls such as stat(2) that audit more complete vnode information. Sponsored by: DARPA, AFRL Approved by: re (kib) MFC after: 1 week	2016-07-05 16:37:01 +00:00
Ed Maste	1cbb879df8	add description for debug.elf{32,64}_legacy_coredump sysctl Approved by: re (kib) MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-07-05 14:46:06 +00:00
Konstantin Belousov	46e47c4f8d	Provide helper macros to detect 'non-silent SBDRY' state and to calculate appropriate return value for stops. Simplify the code by using them. Fix typo in sig_suspend_threads(). The thread which sleep must be aborted is td2. () In issignal(), when handling stopping signal for thread in TD_SBDRY_INTR state, do not stop, this is wrong and fires assert. This is yet another place where execution should be forced out of SBDRY-protected region. For such case, return -1 from issignal() and translate it to corresponding error code in sleepq_catch_signals(). Assert that other consumers of cursig() are not affected by the new return value. () Micro-optimize, mostly VFS and VOP methods, by avoiding calling the functions when SIGDEFERSTOP_NOP non-change is requested. (*) Reported and tested by: pho () Requested by: bde (**) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-07-03 18:19:48 +00:00
Konstantin Belousov	d12d6a8476	Remove racy assert. The thread which changes vnode usecount from 0 to 1 does it under the vnode interlock, but the interlock is not owned by the asserting thread. As result, we might read increased use counter but also still see VI_OWEINACT. In collaboration with: nwhitehorn Hardware donated by: IBM LTC Sponsored by: The FreeBSD Foundation (kib) Approved by: re (gjb)	2016-07-03 01:56:48 +00:00
Konstantin Belousov	e18ee4957d	When a process knote was attached to the process which is already exiting, the knote is activated immediately. If the exit1() later activates knotes, such knote is attempted to be activated second time. Detect the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive activation. Before r302235, such knotes were removed from the knlist immediately upon activation. Reported by: truckman Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-07-01 20:11:28 +00:00
Konstantin Belousov	364c516cff	Currently the ntptime code and resettodr() are Giant-locked. In particular, the Giant is supposed to protect against parallel ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does copyout(9) under Giant and then examines time_status to return syscall result. Since copyout(9) could sleep, the syscall result might be inconsistent. Another and more important issue is that if PPS is configured, hardpps(9) is executed without any protection against the parallel top-level code invocation. Potentially, this may result in the inconsistent state of the ntptime state variables, but I cannot say how serious such distortion is. The non-functional splclock() call in sys_ntp_adjtime() protected against clock interrupts calling hardpps() in the pre-SMP era. Modernize the locking. A mutex protects ntptime data. Due to the hardpps() KPI legitimately serving from the interrupt filters (and e.g. uart(4) does call it from filter), the lock cannot be sleepable mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex to reduce interrupt latency. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6825	2016-06-28 16:43:23 +00:00
Konstantin Belousov	3a0e6f920a	Do not use Giant to prevent parallel calls to CLOCK_SETTIME(). Use private mtx in resettodr(), no implementation of CLOCK_SETTIME() is allowed to sleep. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation Approved by: re (gjb) X-Differential revision: https://reviews.freebsd.org/D6825	2016-06-28 16:42:40 +00:00
Konstantin Belousov	6f56cb8dbf	Complete r302215. TDF_SBDRY \| TDF_SERESTART and TDF_SBDRY \| TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as if TDF_SBDRY is not set for STOP signal delivery. The only difference is that sig_suspend_threads() should abort the sleep instead of doing immediate suspension. Reported by: ngie Sponsored by: The FreeBSD Foundation MFC after: 12 days Approved by: re (gjb)	2016-06-28 16:41:50 +00:00
Konstantin Belousov	9eb3f14324	Fix userspace build after r302235: do not expose bool field of the structure, change it to int. The real fix is to sanitize user-visible definitions in sys/event.h, e.g. the affected struct knlist is of no use for userspace programs. Reported and tested by: jkim Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-27 23:34:53 +00:00
Konstantin Belousov	9e590ff04b	When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859	2016-06-27 21:52:17 +00:00
Konstantin Belousov	883a5a4a6a	When sleeping waiting for either local or remote advisory lock, interrupt sleeps with the ERESTART on the suspension attempts. Otherwise, single-threading requests are deferred until the locks are granted for NFS files, which causes hangs. When retrying local registration of the remotely-granted adv lock, allow full suspension and check for suspension, for usual reasons. Reported by: markj, pho Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-26 20:08:42 +00:00
Konstantin Belousov	3a1e5dd8e6	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-26 20:07:24 +00:00
Konstantin Belousov	8a06de9e92	Do not clear robust lists pointers on fork. The forked child thread lists must be functional. Reported by: Daniel Engberg <daniel.engberg.lists@pyret.net>, Guy Yur <guyyur@gmail.com> Tested by: Guy Yur <guyyur@gmail.com> Sponsored by: The FreeBSD Foundation Approved by: re (gjb), including the KBI change	2016-06-25 11:31:25 +00:00
Jilles Tjoelker	6ea906eec0	posixshm: Fix lock leak when mac_posixshm_check_read rejects read. While reading the code, I noticed that shm_read() returns without unlocking foffset and rangelock if mac_posixshm_check_read() rejects the read. Reviewed by: kib, jhb, rwatson Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6927	2016-06-23 20:59:13 +00:00
Brooks Davis	a72c64b0b6	Generate syscall tables and update pipe() implementation after r302094. Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:18:19 +00:00
Brooks Davis	e16e64098c	Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Commit with regenerated files and implementation to follow. Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:15:59 +00:00
Brooks Davis	e52e02ba24	Add support for COMPAT10 keywords in syscalls.master. Approved by: re (gjb) Sponsored by: DARPA, AFRL	2016-06-22 21:12:53 +00:00
John Baldwin	b1012d8036	Account for AIO socket operations in thread/process resource usage. File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thread that completes an AIO request via aio_return() or aio_waitcomplete(). This change extends AIO jobs to store counts of received/sent messages and updates socket backends to set these counts accordingly. Note that the socket backends are careful to only charge a single messages for each AIO request even though a single request on a blocking socket might invoke sosend or soreceive multiple times. This is to mimic the resource accounting of synchronous read/write. Adjust the UNIX socketpair AIO test to verify that the message resource usage counts update accordingly for aio_read and aio_write. Approved by: re (hrs) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6911	2016-06-21 22:19:06 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Konstantin Belousov	d9a503bec4	Fix typo. Note that atomic is still required even for interlocked case. Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2016-06-20 15:45:50 +00:00
Mateusz Guzik	e896fb3bae	vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels This removes calls to empty functions like vop_lock_{pre/post} from common vfs routines. Approved by: re (gjb)	2016-06-17 19:41:30 +00:00
Konstantin Belousov	f8a75278dc	Add VFS interface to flush specified amount of free vnodes belonging to mount points with the given filesystem type, specified by mount vfs_ops pointer. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)	2016-06-17 17:33:25 +00:00
Konstantin Belousov	5c2cf81845	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
Konstantin Belousov	bd07998e0e	Remove XXX comments from kern_thread.c. In one case, there is no reason for it in modern times. In the other case, expand the comment stating instead of doubting. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) X-Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:01:11 +00:00
Konstantin Belousov	13d2cd3b68	Remove code duplication. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) X-Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 11:58:46 +00:00
John Baldwin	fe0bdd1d2c	Move backend-specific fields of kaiocb into a union. This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields. Change the socket and Chelsio DDP backends to use 'backend3' instead of abusing _aiocb_private.status directly. This confines the use of _aiocb_private to the AIO internals in vfs_aio.c. Reviewed by: kib (earlier version) Approved by: re (gjb) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6547	2016-06-15 20:56:45 +00:00
Konstantin Belousov	9fdbfd3b6c	Do not assume that we own the use reference on the covered vnode until we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which wins race with us could dereference the covered vnode, and we are left with the locked freed memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week	2016-06-15 15:56:03 +00:00
Jamie Gritton	932a6e432d	Fix a vnode leak when giving a child jail a too-long path when debug.disablefullpath=1.	2016-06-09 21:59:11 +00:00
Jamie Gritton	cf0313c679	Re-order some jail parameter reading to prevent a vnode leak.	2016-06-09 20:43:14 +00:00
Jamie Gritton	176ff3a066	Clean up some logic in jail error messages, replacing a missing test and a redundant test with a single correct test.	2016-06-09 20:39:57 +00:00
Mariusz Zaborski	fb4cdc96a8	Define tunable instead of using CTLFLAG_RWTUN flag with kern.corefile. The allproc_lock lock used in the sysctl_kern_corefile function is initialized in the procinit function which is called after setting sysctl values at boot. That means if we set kern.corefile at boot we will be trying to use lock with is uninitialized and machine will crash. If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will not call the sysctl_kern_corefile function and we will not use an uninitialized lock. When machine will boot then we will start using function depending on the lock. Reviewed by: pjd	2016-06-09 20:23:30 +00:00
Conrad Meyer	8a3aeac27b	Add DDB command "kldstat" It prints much the same information as kldstat(8) without any arguments. Suggested by: jhibbits Sponsored by: EMC / Isilon Storage Division	2016-06-09 18:27:41 +00:00
Conrad Meyer	dd6ea7f7bc	kvprintf: Pad %c to width, like %s Sponsored by: EMC / Isilon Storage Division	2016-06-09 18:24:51 +00:00
Jamie Gritton	ef0ddea316	Make sure the OSD methods for jail set and remove can't run concurrently, by holding allprison_lock exclusively (even if only for a moment before downgrading) on all paths that call PR_METHOD_REMOVE. Since they may run on a downgraded lock, it's still possible for them to run concurrently with PR_METHOD_GET, which will need to use the prison lock.	2016-06-09 16:41:41 +00:00

1 2 3 4 5 ...

15030 commits