mattermost/server/channels/store/sqlstore/context.go

38 lines
1.2 KiB
Go
Raw Permalink Normal View History

MM-30026: Use DB master when getting team members from a session (#16170) * MM-30026: Use DB master when getting team members from a session A race condition happens when the read-replica isn't updated yet by the time a session expiry message reaches another node in the cluster. Here is the sequence of events that can cause it: - Server1 gets any request which has to wipe session cache. - The SQL query is written to DB master, and a cluster message is propagated to clear the session cache for that user. - Now before the read-replica is updated with the master’s update, the cluster message reaches Server2. The session cache is wiped out for that user. - _Any random_ request for that user hits Server2. Does NOT have to be the update team name request. The request does not find the value in session cache, because it’s wiped off, and picks it up from the DB. Surprise surprise, it gets the stale value. Sticks it into the cache. By now, the read-replica is updated. But guess what, we aren’t going to ask the DB anymore, because we have it in the cache. And the cache has the stale value. We use a temporary approach for now by introducing a context in the DB calls so that the useMaster information can be easily passed. And this has the added advantage of reusing the same context for future DB calls in case it happens. And we can also add more context keys as needed. A proper approach needs some architectural changes. See the issue for more details. ```release-note Fixed a bug where a session will hold on to a cached value in an HA setup with read-replicas configured. ``` * incorporate review comments Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
2020-11-10 00:13:45 -05:00
// Copyright (c) 2015-present Mattermost, Inc. All Rights Reserved.
// See LICENSE.txt for license information.
package sqlstore
import (
"context"
"github.com/mattermost/mattermost/server/public/shared/request"
MM-60441: Re-index public channels when a user joins a team (#33400) * Index all public channels when a user joins a team * Precompute team members for indexChannelsForTeam * Refactor RequestContextWithMaster to store package This way, we can import it from both the sqlstore and the searchlayer packages. The alternative for this is duplicating the code in those two packages, but that will *not* work: The context package expects custom types for the keys stored in it, so that different packages never clash with each other when trying to register a new key. See the docs for the WithValue function: https://pkg.go.dev/context#WithValue If we try to duplicate the storeContextKey type in both the sqlstore and searchlayer packages, although they *look* the same, they are not, and HasMaster will fail to get the value of the storeContextKey(useMaster) key if it's from the other package. * Use master in call to GetTeamMembersForChannel In GetTeamMembersForChannel, use the DB from the newly passed context, which will be the receiving context everywhere except in the call done from indexChannelsForTeam, to avoid the read after write issue when saving a team member. * Fix GetPublicChannelsForTeam paging We were using the page and perPage arguments as is in the call to GetPublicChannelsForTeam, but that function expects and offset and a limit as understood by SQL. Although perPage and limit are interchangeable, offset is not equal to page, but to page * perPage. * Add a synchronous bulk indexer for Opensearch * Implement Opensearch's SyncBulkIndexChannels * Add a synchronous bulk indexer for Elasticsearch * Implement Elasticsearch's SynkBulkIndexChannels * Test SyncBulkIndexChannels * make mocks * Bulk index channels on indexChannelsForTeam * Handle error from SyncBulkIndexChannels * Fix style * Revert indexChannelWithTeamMembers refactor * Remove defensive code on sync bulk processor * Revert "Add a synchronous bulk indexer for Opensearch" This reverts commit bfe4671d96bffa9ca27ed3c655fc5527b72bbafb. * Revert "Add a synchronous bulk indexer for Elasticsearch" This reverts commit 6643ae3f30c461544d0861aec7dee1f24e507c37. * Refactor bulk indexers with a common interface * Test all the different implementations Assisted by Claude * Remove debug statements * Refactor common code into _stop * Rename getUserIDsFor{,Private}Channel * Wrap error * Make perPage a const * Fix typos * Call GetTeamsForUser only if needed * Differentiate errors for sync/async processors --------- Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com> Co-authored-by: Mattermost Build <build@mattermost.com>
2025-08-25 13:28:19 -04:00
"github.com/mattermost/mattermost/server/v8/channels/store"
MM-30026: Use DB master when getting team members from a session (#16170) * MM-30026: Use DB master when getting team members from a session A race condition happens when the read-replica isn't updated yet by the time a session expiry message reaches another node in the cluster. Here is the sequence of events that can cause it: - Server1 gets any request which has to wipe session cache. - The SQL query is written to DB master, and a cluster message is propagated to clear the session cache for that user. - Now before the read-replica is updated with the master’s update, the cluster message reaches Server2. The session cache is wiped out for that user. - _Any random_ request for that user hits Server2. Does NOT have to be the update team name request. The request does not find the value in session cache, because it’s wiped off, and picks it up from the DB. Surprise surprise, it gets the stale value. Sticks it into the cache. By now, the read-replica is updated. But guess what, we aren’t going to ask the DB anymore, because we have it in the cache. And the cache has the stale value. We use a temporary approach for now by introducing a context in the DB calls so that the useMaster information can be easily passed. And this has the added advantage of reusing the same context for future DB calls in case it happens. And we can also add more context keys as needed. A proper approach needs some architectural changes. See the issue for more details. ```release-note Fixed a bug where a session will hold on to a cached value in an HA setup with read-replicas configured. ``` * incorporate review comments Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
2020-11-10 00:13:45 -05:00
)
// WithMaster adds the context value that master DB should be selected for this request.
//
// Deprecated: This method is deprecated and there's ongoing change to use `request.CTX` across
// instead of `context.Context`. Please use `RequestContextWithMaster` instead.
func WithMaster(ctx context.Context) context.Context {
MM-60441: Re-index public channels when a user joins a team (#33400) * Index all public channels when a user joins a team * Precompute team members for indexChannelsForTeam * Refactor RequestContextWithMaster to store package This way, we can import it from both the sqlstore and the searchlayer packages. The alternative for this is duplicating the code in those two packages, but that will *not* work: The context package expects custom types for the keys stored in it, so that different packages never clash with each other when trying to register a new key. See the docs for the WithValue function: https://pkg.go.dev/context#WithValue If we try to duplicate the storeContextKey type in both the sqlstore and searchlayer packages, although they *look* the same, they are not, and HasMaster will fail to get the value of the storeContextKey(useMaster) key if it's from the other package. * Use master in call to GetTeamMembersForChannel In GetTeamMembersForChannel, use the DB from the newly passed context, which will be the receiving context everywhere except in the call done from indexChannelsForTeam, to avoid the read after write issue when saving a team member. * Fix GetPublicChannelsForTeam paging We were using the page and perPage arguments as is in the call to GetPublicChannelsForTeam, but that function expects and offset and a limit as understood by SQL. Although perPage and limit are interchangeable, offset is not equal to page, but to page * perPage. * Add a synchronous bulk indexer for Opensearch * Implement Opensearch's SyncBulkIndexChannels * Add a synchronous bulk indexer for Elasticsearch * Implement Elasticsearch's SynkBulkIndexChannels * Test SyncBulkIndexChannels * make mocks * Bulk index channels on indexChannelsForTeam * Handle error from SyncBulkIndexChannels * Fix style * Revert indexChannelWithTeamMembers refactor * Remove defensive code on sync bulk processor * Revert "Add a synchronous bulk indexer for Opensearch" This reverts commit bfe4671d96bffa9ca27ed3c655fc5527b72bbafb. * Revert "Add a synchronous bulk indexer for Elasticsearch" This reverts commit 6643ae3f30c461544d0861aec7dee1f24e507c37. * Refactor bulk indexers with a common interface * Test all the different implementations Assisted by Claude * Remove debug statements * Refactor common code into _stop * Rename getUserIDsFor{,Private}Channel * Wrap error * Make perPage a const * Fix typos * Call GetTeamsForUser only if needed * Differentiate errors for sync/async processors --------- Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com> Co-authored-by: Mattermost Build <build@mattermost.com>
2025-08-25 13:28:19 -04:00
return store.WithMaster(ctx)
MM-30026: Use DB master when getting team members from a session (#16170) * MM-30026: Use DB master when getting team members from a session A race condition happens when the read-replica isn't updated yet by the time a session expiry message reaches another node in the cluster. Here is the sequence of events that can cause it: - Server1 gets any request which has to wipe session cache. - The SQL query is written to DB master, and a cluster message is propagated to clear the session cache for that user. - Now before the read-replica is updated with the master’s update, the cluster message reaches Server2. The session cache is wiped out for that user. - _Any random_ request for that user hits Server2. Does NOT have to be the update team name request. The request does not find the value in session cache, because it’s wiped off, and picks it up from the DB. Surprise surprise, it gets the stale value. Sticks it into the cache. By now, the read-replica is updated. But guess what, we aren’t going to ask the DB anymore, because we have it in the cache. And the cache has the stale value. We use a temporary approach for now by introducing a context in the DB calls so that the useMaster information can be easily passed. And this has the added advantage of reusing the same context for future DB calls in case it happens. And we can also add more context keys as needed. A proper approach needs some architectural changes. See the issue for more details. ```release-note Fixed a bug where a session will hold on to a cached value in an HA setup with read-replicas configured. ``` * incorporate review comments Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
2020-11-10 00:13:45 -05:00
}
// RequestContextWithMaster adds the context value that master DB should be selected for this request.
func RequestContextWithMaster(rctx request.CTX) request.CTX {
return store.RequestContextWithMaster(rctx)
}
// HasMaster is a helper function to check whether master DB should be selected or not.
func HasMaster(ctx context.Context) bool {
MM-60441: Re-index public channels when a user joins a team (#33400) * Index all public channels when a user joins a team * Precompute team members for indexChannelsForTeam * Refactor RequestContextWithMaster to store package This way, we can import it from both the sqlstore and the searchlayer packages. The alternative for this is duplicating the code in those two packages, but that will *not* work: The context package expects custom types for the keys stored in it, so that different packages never clash with each other when trying to register a new key. See the docs for the WithValue function: https://pkg.go.dev/context#WithValue If we try to duplicate the storeContextKey type in both the sqlstore and searchlayer packages, although they *look* the same, they are not, and HasMaster will fail to get the value of the storeContextKey(useMaster) key if it's from the other package. * Use master in call to GetTeamMembersForChannel In GetTeamMembersForChannel, use the DB from the newly passed context, which will be the receiving context everywhere except in the call done from indexChannelsForTeam, to avoid the read after write issue when saving a team member. * Fix GetPublicChannelsForTeam paging We were using the page and perPage arguments as is in the call to GetPublicChannelsForTeam, but that function expects and offset and a limit as understood by SQL. Although perPage and limit are interchangeable, offset is not equal to page, but to page * perPage. * Add a synchronous bulk indexer for Opensearch * Implement Opensearch's SyncBulkIndexChannels * Add a synchronous bulk indexer for Elasticsearch * Implement Elasticsearch's SynkBulkIndexChannels * Test SyncBulkIndexChannels * make mocks * Bulk index channels on indexChannelsForTeam * Handle error from SyncBulkIndexChannels * Fix style * Revert indexChannelWithTeamMembers refactor * Remove defensive code on sync bulk processor * Revert "Add a synchronous bulk indexer for Opensearch" This reverts commit bfe4671d96bffa9ca27ed3c655fc5527b72bbafb. * Revert "Add a synchronous bulk indexer for Elasticsearch" This reverts commit 6643ae3f30c461544d0861aec7dee1f24e507c37. * Refactor bulk indexers with a common interface * Test all the different implementations Assisted by Claude * Remove debug statements * Refactor common code into _stop * Rename getUserIDsFor{,Private}Channel * Wrap error * Make perPage a const * Fix typos * Call GetTeamsForUser only if needed * Differentiate errors for sync/async processors --------- Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com> Co-authored-by: Mattermost Build <build@mattermost.com>
2025-08-25 13:28:19 -04:00
return store.HasMaster(ctx)
MM-30026: Use DB master when getting team members from a session (#16170) * MM-30026: Use DB master when getting team members from a session A race condition happens when the read-replica isn't updated yet by the time a session expiry message reaches another node in the cluster. Here is the sequence of events that can cause it: - Server1 gets any request which has to wipe session cache. - The SQL query is written to DB master, and a cluster message is propagated to clear the session cache for that user. - Now before the read-replica is updated with the master’s update, the cluster message reaches Server2. The session cache is wiped out for that user. - _Any random_ request for that user hits Server2. Does NOT have to be the update team name request. The request does not find the value in session cache, because it’s wiped off, and picks it up from the DB. Surprise surprise, it gets the stale value. Sticks it into the cache. By now, the read-replica is updated. But guess what, we aren’t going to ask the DB anymore, because we have it in the cache. And the cache has the stale value. We use a temporary approach for now by introducing a context in the DB calls so that the useMaster information can be easily passed. And this has the added advantage of reusing the same context for future DB calls in case it happens. And we can also add more context keys as needed. A proper approach needs some architectural changes. See the issue for more details. ```release-note Fixed a bug where a session will hold on to a cached value in an HA setup with read-replicas configured. ``` * incorporate review comments Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
2020-11-10 00:13:45 -05:00
}
// DBXFromContext is a helper utility that returns the sqlx DB handle from a given context.
func (ss *SqlStore) DBXFromContext(ctx context.Context) *sqlxDBWrapper {
if HasMaster(ctx) {
return ss.GetMaster()
}
return ss.GetReplica()
}