atexit and __cxa_atexit handlers that are either installed by unloaded
dso, or points to the functions provided by the dso.
Use _rtld_addr_phdr to locate segment information from the address of
private variable belonging to the dso, supplied by crtstuff.c. Provide
utility function __elf_phdr_match_addr to do the match of address against
dso executable segment.
Call back into libthr from __cxa_finalize using weak
__pthread_cxa_finalize symbol to remove any atfork handler which
function points into unloaded object.
The rtld needs private __pthread_cxa_finalize symbol to not require
resolution of the weak undefined symbol at initialization time. This
cannot work, since rtld is relocated before sym_zero is set up.
Idea by: kan
Reviewed by: kan (previous version)
MFC after: 3 weeks
which does not know what is the state of interrupted system call, for
example, open() system call opened a file and the thread is still cancelled,
result is descriptor leak, there are other problems which can cause resource
leak or undeterminable side effect when a thread is cancelled. However, this
is no longer true in new implementation.
In defering mode, a thread is canceled if cancellation request is pending and
later the thread enters a cancellation point, otherwise, a later
pthread_cancel() just causes SIGCANCEL to be sent to the target thread, and
causes target thread to abort system call, userland code in libthr then checks
cancellation state, and cancels the thread if needed. For example, the
cancellation point open(), the thread may be canceled at start,
but later, if it opened a file descriptor, it is not canceled, this avoids
file handle leak. Another example is read(), a thread may be canceled at start
of the function, but later, if it read some bytes from a socket, the thread
is not canceled, the caller then can decide if it should still enable cancelling
or disable it and continue reading data until it thinks it has read all
bytes of a packet, and keeps a protocol stream in health state, if user ignores
partly reading of a packet without disabling cancellation, then second iteration
of read loop cause the thread to be cancelled.
An exception is that the close() cancellation point always closes a file handle
despite whether the thread is cancelled or not.
The old mechanism is still kept, for a functions which is not so easily to
fix a cancellation problem, the rough mechanism is used.
Reviewed by: kib@
is acted upon, or when a thread calls pthread_exit(), the thread first
disables cancellation by setting its cancelability state to
PTHREAD_CANCEL_DISABLE and its cancelability type to
PTHREAD_CANCEL_DEFERRED. The cancelability state remains set to
PTHREAD_CANCEL_DISABLE until the thread has terminated.
It has no effect if a cancellation cleanup handler or thread-specific
data destructor routine changes the cancelability state to
PTHREAD_CANCEL_ENABLE.
now type sema_t is a structure which can be put in a shared memory area,
and multiple processes can operate it concurrently.
User can either use mmap(MAP_SHARED) + sem_init(pshared=1) or use sem_open()
to initialize a shared semaphore.
Named semaphore uses file system and is located in /tmp directory, and its
file name is prefixed with 'SEMD', so now it is chroot or jail friendly.
In simplist cases, both for named and un-named semaphore, userland code
does not have to enter kernel to reduce/increase semaphore's count.
The semaphore is designed to be crash-safe, it means even if an application
is crashed in the middle of operating semaphore, the semaphore state is
still safely recovered by later use, there is no waiter counter maintained
by userland code.
The main semaphore code is in libc and libthr only has some necessary stubs,
this makes it possible that a non-threaded application can use semaphore
without linking to thread library.
Old semaphore implementation is kept libc to maintain binary compatibility.
The kernel ksem API is no longer used in the new implemenation.
Discussed on: threads@
The race condition is believed to be in UMTX_OP_MUTEX_WAKE. On ia64,
we simply go to the kernel to unlock.
The big question is why this is only a race condition on ia64...
MFC after: 3 days
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
query umtx also if the shared waiters bit is set on a shared lock.
The writer starvation avoidance technique, infact, can lead to shared
waiters on a shared lock which can bring to a missed wakeup and thus
to a deadlock if the right bit is not checked (a notable case is the
writers counterpart to be handled through expired timeouts).
Fix that by checking for the shared waiters bit also when unlocking the
shared locks.
That bug was causing a reported MySQL deadlock.
Many thanks go to Nick Esborn and his employer DesertNet which provided
time and machines to identify and fix this issue.
PR: thread/135673
Reported by: Nick Esborn <nick at desert dot net>
Tested by: Nick Esborn <nick at desert dot net>
Reviewed by: jeff
The most notable is that it is not bumped in rwlock_rdlock_common() when
the hard path (__thr_rwlock_rdlock()) returns successfully.
This can lead to deadlocks in libthr when rwlocks recursion in read mode
happens.
Fix the interested parts by correctly handling rdlock_count.
PR: threads/136345
Reported by: rink
Tested by: rink
Reviewed by: jeff
Approved by: re (kib)
MFC: 2 weeks
by temporary pretending that the process is still multithreaded.
Current malloc lock primitives do nothing for singlethreaded process.
Reviewed by: davidxu, deischen
in order to get the symbol binding state "just so". This is to allow
locking to be activated and not run into recursion problems later.
However, one of the magic bits involves an explicit call to _umtx_op()
to force symbol resolution. It does a wakeup operation on a fake,
uninitialized (ie: random contents) umtx. Since libthr isn't active, this
is harmless. Nothing can match the random wakeup.
However, valgrind finds this and is not amused. Normally I'd just
write a suppression record for it, but the idea of passing random
args to syscalls (on purpose) just doesn't feel right.
does not use any external symbols, thus avoiding possible recursion into
rtld to resolve symbols, when called.
Reviewed by: kan, davidxu
Tested by: rink
MFC after: 1 month
by switching into single-thread mode.
libthr ignores broken use of lock bitmaps used by default rtld locking
implementation, this in turn turns lock handoff in _rtld_thread_init
into NOP. This in turn makes child processes of forked multi-threaded
programs to run with _thr_signal_block still in effect, with most
signals blocked.
Reported by: phk, kib
us working malloc in the fork child of the multithreaded process.
Although POSIX requires that only async-signal safe functions shall be
operable after fork in multithreaded process, not having malloc lower
the quality of our implementation.
Tested by: rink
Discussed with: kan, davidxu
Reviewed by: kan
MFC after: 1 month
Threading library calls _pre before the fork, allowing the rtld to
lock itself to ensure that other threads of the process are out of
dynamic linker. _post releases the locks.
This allows the rtld to have consistent state in the child. Although
child may legitimately call only async-safe functions, the call may
need plt relocation resolution, and this requires working rtld.
Reported and debugging help by: rink
Reviewed by: kan, davidxu
MFC after: 1 month (anyway, not before 7.1 is out)
where critical. Some places still use ps_pread/ps_pwrite directly,
but only need changed when byte-order comes into the picture.
Also, change th_p in td_event_msg_t from a pointer type to
psaddr_t, so that events also work when psaddr_t is widened.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.
Tested by: kris
Sounds great: jeff
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.
_thr_suspend_check() which messes sigmask saved in thread structure.
- Don't suspend a thread has force_exit set.
- In pthread_exit(), if there is a suspension flag set, wake up waiting-
thread after setting PS_DEAD, this causes waiting-thread to break loop
in suspend_common().
that there might be starvations, but because we have already locked the
thread, the cpuset settings will always be done before the new thread
does real-world work.
we set scheduling parameters and cpu binding fully in userland, and
because default scheduling policy is SCHED_RR (time-sharing), we set
default sched_inherit to PTHREAD_SCHED_INHERIT, this saves a system
call.
however if current thread is executing cancellation handler, signal
SIGCANCEL may have already been blocked, this is unexpected, unblock the
signal in new thread if this happens.
MFC after: 1 week
the semantics of pthread_mutex_islocked_np() to return true if and only if
the mutex is held by the current thread.
Obviously, change the regression test to match.
MFC after: 2 weeks
locked. This is intended primarily to support the userland equivalent
of the various *_ASSERT_LOCKED() macros we have in the kernel.
MFC after: 2 weeks
loop count.
2. Add function pthread_mutex_setyieldloops_np to turn a mutex's yield
loop count.
3. Make environment variables PTHREAD_SPINLOOPS and PTHREAD_YIELDLOOPS
to be only used for turnning PTHREAD_MUTEX_ADAPTIVE_NP mutex.
fixes a NULL-dereference of curthread when libstdc+ initializes
the exception handling globals on archs we can't use GNU TLS due
to lack of support in binutils 2.15 (i.e. arm and sparc64), yet,
thus making threaded C++ programs compiled with GCC 4.2.1 work
again on these archs.
Reviewed by: davidxu
MFC after: 3 days
to tune pthread mutex performance:
1. LIBPTHREAD_SPINLOOPS
If a pthread mutex is being locked by another thread, this environment
variable sets total number of spin loops before the current thread
sleeps in kernel, this saves a syscall overhead if the mutex will be
unlocked very soon (well written application code).
2. LIBPTHREAD_YIELDLOOPS
If a pthread mutex is being locked by other threads, this environment
variable sets total number of sched_yield() loops before the currrent
thread sleeps in kernel. if a pthread mutex is locked, the current thread
gives up cpu, but will not sleep in kernel, this means, current thread
does not set contention bit in mutex, but let lock owner to run again
if the owner is on kernel's run queue, and when lock owner unlocks the
mutex, it does not need to enter kernel and do lots of work to resume
mutex waiters, in some cases, this saves lots of syscall overheads for
mutex owner.
In my practice, sometimes LIBPTHREAD_YIELDLOOPS can massively improve performance
than LIBPTHREAD_SPINLOOPS, this depends on application. These two environments
are global to all pthread mutex, there is no interface to set them for each
pthread mutex, the default values are zero, this means spinning is turned off
by default.
is also implemented in glibc and is used by a number of existing
applications (mysql, firefox, etc).
This mutex type is a default mutex with the additional property that
it spins briefly when attempting to acquire a contested lock, doing
trylock operations in userland before entering the kernel to block if
eventually unsuccessful.
The expectation is that applications requesting this mutex type know
that the mutex is likely to be only held for very brief periods, so it
is faster to spin in userland and probably succeed in acquiring the
mutex, than to enter the kernel and sleep, only to be woken up almost
immediately. This can help significantly in certain cases when
pthread mutexes are heavily contended and held for brief durations
(such as mysql).
Spin up to 200 times before entering the kernel, which represents only
a few us on modern CPUs. No performance degradation was observed with
this value and it is sufficient to avoid a large performance drop in
mysql performance in the heavily contended pthread mutex case.
The libkse implementation is a NOP.
Reviewed by: jeff
MFC after: 3 days
_thr_ucond_broadcast, clear condition variable pointer in cancellation
info after returing from _thr_ucond_wait, since kernel has already
dropped the internal lock, so we don't need to unlock it in cancellation
handler again.
is also returned by pthread_detach() if a thread was already
detached, the error code was already documented:
> [EINVAL] The implementation has detected that the value speci-
> fied by thread does not refer to a joinable thread.
into pthread structure to keep track of locked PTHREAD_PRIO_PROTECT mutex,
no real mutex code is changed, the mutex locking and unlocking code should
has same performance as before.
wait(), waitpid() and usleep(), they are internal versions and
should not be cancellation points.
2. Make wait3() as a cancellation point.
3. Move raise() and pause() into file thr_sig.c.
4. Add functions _sigsuspend, _sigwait, _sigtimedwait and _sigwaitinfo,
remove SIGCANCEL bit in wait-set for those functions, the signal is
used internally to implement thread cancellation.
to make it work, turnstile like mechanism to support priority
propagating and other realtime scheduling options in kernel
should be available to userland mutex, for the moment, I just
want to make libthr be simple and efficient thread library.
Discussed with: deischen, julian
* Add posix_memalign().
* Move calloc() from calloc.c to malloc.c. Add a calloc() implementation in
rtld-elf in order to make the loader happy (even though calloc() isn't
used in rtld-elf).
* Add _malloc_prefork() and _malloc_postfork(), and use them instead of
directly manipulating __malloc_lock.
Approved by: phk, markm (mentor)
operation, the caller is blocked util target threads are really
suspended, also avoid suspending a thread when it is holding a
critical lock.
Fix a bug in _thr_ref_delete which tests a never set flag.
rewritten, now timers created with same sigev_notify_attributes will
run in same thread, this allows user to organize which timers can
run in same thread to save some thread resource.
1. fast simple type mutex.
2. __thread tls works.
3. asynchronous cancellation works ( using signal ).
4. thread synchronization is fully based on umtx, mainly, condition
variable and other synchronization objects were rewritten by using
umtx directly. those objects can be shared between processes via
shared memory, it has to change ABI which does not happen yet.
5. default stack size is increased to 1M on 32 bits platform, 2M for
64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.
Okayed by: jeff, mtm, rwatson, scottl
no userland locks are heald, the dead thread lock can no longer protect
access to it. Therefore, instead of using an if (!dead)...else clause
after walking the active threads list test the thread pointer before
deciding not to walk the dead threads list. If the thread pointer is null
it means it was not found in the active threads list and the dead threads
list should be checked.
2. Do not free the stack of a thread that is not marked dead. This is the
2nd and final part of eliminating the race to free a thread's stack.
MFC after: 3 days
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.
Discussed with: davidxu, deischen, marcel
MFC after: 3 days
specified mutex is invalid. In spec parlance 'MAY FAIL' means it's
up to the implementor. So, remove the check for NULL pointers for two
reasons:
1. A mutex may be invalid without necessarily being NULL.
2. If the pointer to the mutex is NULL core-dumping in the
vicinity of the problem is much much much better than failing
in some other part of the code (especially when the application
doesn't check the return value of the function that you oh so
helpfully set to EINVAL).
o In the rwlock code: move a duplicated check inside an if..else to after
the if...else clause.
o When initializing a static rwlock move the initialization check
inside the lock.
o In thr_setschedparam.c: When breaking out of the trylock...retry if busy
loop make sure to reset the mtx pointer to null if the mutex is nolonger
in a queue.
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.
To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.
Reviewed by: mtm@
ABI preservation tested on: i386, ia64
a fork, make sure that the current thread isn't detached and freed. As
a consequence the thread should be inserted into the head of the
active list only once (in the beginning).
followed are: Only 3 functions (pthread_cancel, pthread_setcancelstate,
pthread_setcanceltype) are required to be async-signal-safe by POSIX. None of
the rest of the pthread api is required to be async-signal-safe. This means
that only the three mentioned functions are safe to use from inside
signal handlers.
However, there are certain system/libc calls that are
cancellation points that a caller may call from within a signal handler,
and since they are cancellation points calls have to be made into libthr
to test for cancellation and exit the thread if necessary. So, the
cancellation test and thread exit code paths must be async-signal-safe
as well. A summary of the changes follows:
o Almost all of the code paths that masked signals, as well as locking the
pthread structure now lock only the pthread structure.
o Signals are masked (and left that way) as soon as a thread enters
pthread_exit().
o The active and dead threads locks now explicitly require that signals
are masked.
o Access to the isdead field of the pthread structure is protected by both
the active and dead list locks for writing. Either one is sufficient for
reading.
o The thread state and type fields have been combined into one three-state
switch to make it easier to read without requiring a lock. It doesn't need
a lock for writing (and therefore for reading either) because only the
current thread can write to it and it is an integer value.
o The thread state field of the pthread structure has been eliminated. It
was an unnecessary field that mostly duplicated the flags field, but
required additional locking that would make a lot more code paths require
signal masking. Any truly unique values (such as PS_DEAD) have been
reborn as separate members of the pthread structure.
o Since the mutex and condvar pthread functions are not async-signal-safe
there is no need to muck about with the wait queues when handling
a signal ...
o ... which also removes the need for wrapping signal handlers and sigaction(2).
o The condvar and mutex async-cancellation code had to be revised as a result
of some of these changes, which resulted in semi-unrelated changes which
would have been difficult to work on as a separate commit, so they are
included as well.
The only part of the changes I am worried about is related to locking for
the pthread joining fields. But, I will take a closer look at them once this
mega-patch is committed.
makeing sure the spinlock isn't already in use might be a nice feature to
have in theory, it's hard to implement in practice since the passed in
pointer may not be NULL, but still be an invalid value (i.e. 1..2..3.. etc).
functionality spelled out in SUSv3.
o Signal of 0 means do everything except send the signal
o Check that the signal is not invalid
o Check that the target thread is not dead/invalid