Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pagination results order of the logs is incorrect #9705

Open
honganan opened this issue Jun 14, 2023 · 3 comments · May be fixed by #9759
Open

The pagination results order of the logs is incorrect #9705

honganan opened this issue Jun 14, 2023 · 3 comments · May be fixed by #9759
Labels
component/query-frontend type/bug Somehing is not working as expected

Comments

@honganan
Copy link
Contributor

honganan commented Jun 14, 2023

Describe the bug
query-frontend split query and send to queriers, then collect the response data and merge, then return the first limit lines sort by timestamp.

However, In some situations, if the total number of log lines returned by queriers are bigger than the limit, the last returned result may not be the first limit lines. I think problem comes from the heap sort (not sure if it's api usage problem).

I found this in our production environment, after research I create a simple test case below which can reproduce it easier.

To Reproduce

  1. Below is a test case, put it in codec_test.go#Test_codec_MergeResponse and run it. It would fails.
{
	"loki backward special",
	[]queryrangebase.Response{
		&LokiResponse{
			Status:    loghttp.QueryStatusSuccess,
			Direction: logproto.BACKWARD,
			Limit:     10,
			Version:   1,
			Data: LokiData{
				ResultType: loghttp.ResultTypeStream,
				Result: []logproto.Stream{
					{
						Labels: `{foo="bar", level="debug"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 11), Line: "11"},
						},
					},
					{
						Labels: `{foo="bar", level="error"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 12), Line: "12"},
							{Timestamp: time.Unix(0, 10), Line: "10"},
							{Timestamp: time.Unix(0, 9), Line: "9"},
							{Timestamp: time.Unix(0, 8), Line: "8"},
						},
					},
				},
			},
		},
		&LokiResponse{
			Status:    loghttp.QueryStatusSuccess,
			Direction: logproto.BACKWARD,
			Limit:     10,
			Version:   1,
			Data: LokiData{
				ResultType: loghttp.ResultTypeStream,
				Result: []logproto.Stream{
					{
						Labels: `{foo="bar", level="debug"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 6), Line: "6"},
						},
					},
					{
						Labels: `{foo="bar", level="error"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 7), Line: "7"},
							{Timestamp: time.Unix(0, 5), Line: "5"},
							{Timestamp: time.Unix(0, 4), Line: "4"},
						},
					},
				},
			},
		},
		&LokiResponse{
			Status:    loghttp.QueryStatusSuccess,
			Direction: logproto.BACKWARD,
			Limit:     10,
			Version:   1,
			Data: LokiData{
				ResultType: loghttp.ResultTypeStream,
				Result: []logproto.Stream{
					{
						Labels: `{foo="bar", level="debug"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 2), Line: "2"},
							{Timestamp: time.Unix(0, 1), Line: "1"},
						},
					},
					{
						Labels: `{foo="bar", level="error"}`,
						Entries: []logproto.Entry{
							{Timestamp: time.Unix(0, 3), Line: "3"},
						},
					},
				},
			},
		},
	},
	&LokiResponse{
		Status:     loghttp.QueryStatusSuccess,
		Direction:  logproto.BACKWARD,
		Limit:      10,
		Version:    1,
		Statistics: stats.Result{Summary: stats.Summary{Splits: 3}},
		Data: LokiData{
			ResultType: loghttp.ResultTypeStream,
			Result: []logproto.Stream{
				{
					Labels: `{foo="bar", level="error"}`,
					Entries: []logproto.Entry{
						{Timestamp: time.Unix(0, 12), Line: "12"},
						{Timestamp: time.Unix(0, 10), Line: "10"},
						{Timestamp: time.Unix(0, 9), Line: "9"},
						{Timestamp: time.Unix(0, 8), Line: "8"},
						{Timestamp: time.Unix(0, 7), Line: "7"},
						{Timestamp: time.Unix(0, 5), Line: "5"},
						{Timestamp: time.Unix(0, 4), Line: "4"},
						{Timestamp: time.Unix(0, 3), Line: "3"},
					},
				},
				{
					Labels: `{foo="bar", level="debug"}`,
					Entries: []logproto.Entry{
						{Timestamp: time.Unix(0, 11), Line: "11"},
						{Timestamp: time.Unix(0, 6), Line: "6"},
					},
				},
			},
		},
	},
	false,
},

Expected behavior
There are 12 lines in total (nanoseconds 1 ~ 12), and limit is 10, expect return 12 ~ 3.

@honganan honganan changed the title Logs result merge order incorrect The pagination results order of the logs is incorrect Jun 14, 2023
@honganan
Copy link
Contributor Author

honganan commented Jun 14, 2023

The deeper reason is method #mergeOrderedNonOverlappingStreams called by MergeResponse() can only guarantee streams'(first entry) time order, it pops the first stream from heap and put all of it's entries into result(It's possible that not all entries in this stream are the finally topN lines), but cannot make sure entries between several streams returned in order.

A reasonable way is to traverse each stream and get the latest time line and then continue loop to get the next.

In fact it's method name have said that streams should be non-overlapping, but in Loki different streams' entries could have overlapping time, Like above test case.

@JStickler JStickler added the type/bug Somehing is not working as expected label Jun 14, 2023
@chaudum
Copy link
Contributor

chaudum commented Jun 19, 2023

Hi @honganan Thanks for the detailed report. We will have a look at the issue.

@honganan
Copy link
Contributor Author

honganan commented Jun 21, 2023

Hi @honganan Thanks for the detailed report. We will have a look at the issue.

That's great! I have changed the test case as some Statistics part mistakes will make it fails too, this issue problem still exists.

I have made a fix locally, and I am trying to send a PR.

honganan added a commit to honganan/loki that referenced this issue Jun 21, 2023
@honganan honganan linked a pull request Jun 21, 2023 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/query-frontend type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants