Skip to content

Revert "Use private Thrift members (#6430)"#6455

Open
dlmarion wants to merge 1 commit into
apache:mainfrom
dlmarion:revert-thrift-private-members
Open

Revert "Use private Thrift members (#6430)"#6455
dlmarion wants to merge 1 commit into
apache:mainfrom
dlmarion:revert-thrift-private-members

Conversation

@dlmarion

Copy link
Copy Markdown
Contributor

GarbageCollectorIT.gcLotsOfCandidatesIT has consistently failed on a server since #6430 was merged. The failure happens when the garbage collector process is started with 32 MB of memory. This PR branch exists so that I can test this on that server.

This reverts commit bdcd798.

@dlmarion dlmarion self-assigned this Jun 30, 2026
@dlmarion

Copy link
Copy Markdown
Contributor Author

I think I found a real issue with #6430 and we might want to actually revert it. The original code would access a ByteBuffer directly. #6430 made those member variables private and the methods for accessing the ByteBuffer's make a copy before returning them.

@dlmarion

Copy link
Copy Markdown
Contributor Author

Here's a good example of the impact of this change.

ArrayList<Entry<Key,Value>> entries = new ArrayList<>(scanResult.getResults().size());
for (TKeyValue kv : scanResult.getResults()) {
entries.add(new SimpleImmutableEntry<>(new Key(kv.getKey()), new Value(kv.getValue())));
}

The current version of Key(TKey) looks like:

public Key(TKey tkey) {
this.row = ByteBufferUtil.toBytes(tkey.bufferForRow());
this.colFamily = ByteBufferUtil.toBytes(tkey.bufferForColFamily());
this.colQualifier = ByteBufferUtil.toBytes(tkey.bufferForColQualifier());
this.colVisibility = ByteBufferUtil.toBytes(tkey.bufferForColVisibility());
this.timestamp = tkey.getTimestamp();
this.deleted = false;
if (row == null) {
throw new IllegalArgumentException("null row");
}
if (colFamily == null) {
throw new IllegalArgumentException("null column family");
}
if (colQualifier == null) {
throw new IllegalArgumentException("null column qualifier");
}
if (colVisibility == null) {
throw new IllegalArgumentException("null column visibility");
}
}

The same code prior to #6430 looks like:

public Key(TKey tkey) {
this.row = toBytes(tkey.row);
this.colFamily = toBytes(tkey.colFamily);
this.colQualifier = toBytes(tkey.colQualifier);
this.colVisibility = toBytes(tkey.colVisibility);
this.timestamp = tkey.timestamp;
this.deleted = false;
if (row == null) {
throw new IllegalArgumentException("null row");
}
if (colFamily == null) {
throw new IllegalArgumentException("null column family");
}
if (colQualifier == null) {
throw new IllegalArgumentException("null column qualifier");
}
if (colVisibility == null) {
throw new IllegalArgumentException("null column visibility");
}

The calls to bufferForX create a copy of the ByteBuffer.

@dlmarion dlmarion marked this pull request as ready for review June 30, 2026 18:48
@dlmarion dlmarion requested review from DomGarguilo and ctubbsii June 30, 2026 18:49
@ctubbsii

Copy link
Copy Markdown
Member

I'm currently looking into alternatives. I think the private thrift members is a nice change, but I also recognize the problem with the extra protective copy. Please hold on merging this for now, while I investigate.

@ctubbsii

Copy link
Copy Markdown
Member

I created #6457 as an alternative to reverting all this. It will remove the unwanted protective copies that are likely causing the performance problem.

@ctubbsii ctubbsii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to resolve the issue with #6457 instead, if works.

@dlmarion

dlmarion commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

GarbageCollectorIT is now passing again with these changes. Will probably end up closing this in favor of #6457 though if that also fixes the issue.

@ctubbsii

ctubbsii commented Jul 1, 2026

Copy link
Copy Markdown
Member

GarbageCollectorIT is now passing again with these changes. Will probably end up closing this in favor of #6457 though if that also fixes the issue.

I saw it fail due to the same timeout on the first pass, and then pass on the second attempt. I think that test is flaky already, and a revert will not entirely fix the issue. It's possible the changes in #6430 contributed to a performance issue, but I do not think the performance of that particular test is a reliable enough metric to determine whether #6430 is a good change or not.

Even though I am reluctant to revert #6430, for fear of reversing progress, I do think there's more that probably needs to be done here. As it turns out, the getters that return a byte array is not even the code path that creates redundant protective copies. It's the versions of the methods that look like .bufferForValue() that are the ones that are doing unnecessary copies. The method that returns a byte array only makes a copy when it needs to truncate a byte buffer's backing array to size, and that is done at most once (and it's also probably important to do that, because the original backing array is a buffer in the TFramedTransport, and I'm not entirely convinced that it isn't reusable and unsafe as a backing array at this layer; I'd need to dig more into the deserialization code to understand that better).

So really, we should be avoiding the .bufferForValue() style methods completely. In the vast majority of cases, we actually want the byte array anyway, and were just doing our own array extraction from the ByteBuffer. But we don't even need to do that if we use the correct getter that returns an array. Previously, I was thinking that the problem might be that we need to use the getter that returns the ByteBuffer more... but I now understand that is the opposite of what we need to do.

I have abandoned #6457

@ctubbsii

ctubbsii commented Jul 1, 2026

Copy link
Copy Markdown
Member

My next attempt to address the issue is #6459

@dlmarion

dlmarion commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

As it turns out, the getters that return a byte array is not even the code path that creates redundant protective copies. It's the versions of the methods that look like .bufferForValue() that are the ones that are doing unnecessary copies.

I don't think that's correct. It appears to me that the code currently in main performs a copy for both calls to getX and bufferForX. Here's what I'm seeing using TKey as an example:

public byte[] getRow() {
setRow(org.apache.thrift.TBaseHelper.rightSize(row));
return row == null ? null : row.array();
}
public java.nio.ByteBuffer bufferForRow() {
return org.apache.thrift.TBaseHelper.copyBinary(row);
}

The call to bufferForRow calls TBaseHelper.copyBinary. The call to getRow calls TBaseHelper.rightSize, then calls setRow(below) which also calls TBaseHelper.copyBinary.

public TKey setRow(@org.apache.thrift.annotation.Nullable java.nio.ByteBuffer row) {
this.row = org.apache.thrift.TBaseHelper.copyBinary(row);
return this;
}

Your change in #6457 to use the unsafe_binaries Thrift option removed the copy from the bufferForX methods but not the getX methods.

@dlmarion

dlmarion commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

https://issues.apache.org/jira/browse/THRIFT-4555 is the original issue where unsafe_binaries was added to Thrift.

@ctubbsii

ctubbsii commented Jul 2, 2026

Copy link
Copy Markdown
Member

I was focused on the behavior of the rightSize method and overlooked that the getRow method would call the setter, where it does another protective copy.

This is frustrating. The setter should have a protective copy, but the getRow method shouldn't be calling it. It should be doing some internal setRow instead. The unsafe_binaries option is nice, but we really only should have unsafe access directly via the bufferFor methods; we still want protective copies on the setters and constructor. I think we'll need to make a fix upstream, and am in favor of reverting until it's fixed upstream.

The alternative to reverting at this point seems to either modify the code after it is generated, to avoid calling the setter in the getter, or to strip out the protective copy only from the bufferFor methods without using the unsafe_binaries option, or to use the unsafe_binaries option and make sure we're doing our own protective copies in calling code everywhere we call the setter or constructor. None of those seem like good options, so hopefully it's not so bad to push a fix upstream.

@ctubbsii

ctubbsii commented Jul 3, 2026

Copy link
Copy Markdown
Member

I updated #6459 to work around the apparent bug in Thrift that calls the public API setter and making an unnecessary copy when the array is requested from the getter.

@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants