Commit 249cfdf
FIX: VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters (#444)
### Work Item / Issue Reference
<!--
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below
For external contributors: Insert Github Issue number below
Only one reference is required - either GitHub issue OR ADO Work Item.
-->
<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#42604](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/42604)
<!-- External contributors: GitHub Issue -->
> GitHub Issue: #435
-------------------------------------------------------------------
### Summary
This pull request improves the handling of character encoding and buffer
sizing for SQL `CHAR`/`VARCHAR` data in the ODBC Python bindings,
especially for cross-platform compatibility between Linux/macOS and
Windows. The changes ensure that character data is decoded correctly and
that buffer sizes are sufficient to prevent corruption or truncation
when dealing with multi-byte UTF-8 data returned by the ODBC driver on
non-Windows systems.
**Character Encoding Handling:**
- Introduced the `GetEffectiveCharDecoding` function to determine the
correct decoding to use for SQL `CHAR` data: always UTF-8 on Linux/macOS
(since the ODBC driver returns UTF-8), and the user-specified encoding
on Windows. This function is now used consistently throughout the
codebase to select the decoding method.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R42-R54)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2878-R2903)
[[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2955-R2991)
[[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3694)
[[5]](diffhunk://#diff-85167a2d59779df18704284ab7ce46220c3619408fbf22c631ffdf29f794d635R670)
[[6]](diffhunk://#diff-85167a2d59779df18704284ab7ce46220c3619408fbf22c631ffdf29f794d635L814-R840)
**Buffer Sizing for UTF-8:**
- Updated buffer allocation logic to use `columnSize * 4 + 1` for SQL
`CHAR`/`VARCHAR` columns on Linux/macOS, accounting for the worst-case
UTF-8 expansion (up to 4 bytes per character), preventing data
truncation when multi-byte characters are present at the column
boundary.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2944-R2964)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3477-R3484)
[[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3662-R3678)
**Decoding and Data Fetching:**
- Modified all data fetching and decoding paths (`FetchLobColumnData`,
`SQLGetData_wrap`, `ProcessChar`, and batch fetch functions) to use the
effective character encoding and the correct buffer sizes, ensuring
consistent and correct decoding regardless of platform.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2878-R2903)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2955-R2991)
[[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R3694)
[[4]](diffhunk://#diff-85167a2d59779df18704284ab7ce46220c3619408fbf22c631ffdf29f794d635L814-R840)
**API Changes:**
- Updated the `FetchBatchData` function and its callers to accept and
propagate the character encoding parameter, ensuring the encoding
context is preserved throughout the data-fetching stack.
[[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L3601-R3632)
[[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L4088-R4135)
[[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L4224-R4272)
**Minor Fixes:**
- Minor formatting and logging improvements for error messages and
function signatures.
These changes collectively improve correctness and reliability when
handling string data from SQL databases, especially in multi-platform
environments.
**CP1252 VARCHAR Boundary Fix — Summary**
**Problem::** VARCHAR columns with CP1252 non-ASCII characters (e.g., é,
ñ, ö) returned corrupted data when the string length exactly equaled the
column size. Inserting "café René!" into VARCHAR(10) returned "©!".
**Root Cause::**
Three bugs in
[ddbc_bindings.cpp](vscode-file://vscode-app/c:/Users/spaitandi/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html):
**Undersized buffer** — SQLGetData / SQLBindCol allocated [columnSize +
1](vscode-file://vscode-app/c:/Users/spaitandi/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
bytes, but on Linux/macOS the ODBC driver converts server data to UTF-8
where CP1252 é (1 byte) becomes 0xC3 0xA9 (2 bytes). A 10-char string
with 2 accented characters needs 12 bytes, exceeding the 11-byte buffer
→ truncation → LOB fallback re-reads consumed data → corruption.
**Wrong decode encoding** — After fixing the buffer, data arrived intact
but was decoded with the user's
[charEncoding](vscode-file://vscode-app/c:/Users/spaitandi/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
(CP1252) instead of UTF-8. Since ODBC on Linux/macOS already converts to
UTF-8, double-interpreting as CP1252 produced mojibake (café René!).
**[ProcessChar](vscode-file://vscode-app/c:/Users/spaitandi/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
assumed UTF-8 on all platforms** — The batch/fetchall hot path used
[PyUnicode_FromStringAndSize](vscode-file://vscode-app/c:/Users/spaitandi/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
which assumes UTF-8 input. Correct on Linux (ODBC returns UTF-8), but
wrong on Windows (ODBC returns native encoding like CP1252).
---------
Co-authored-by: subrata-ms <subrata@microsoft.com>
Co-authored-by: gargsaumya <saumyagarg.100@gmail.com>1 parent 7ebef4f commit 249cfdf
4 files changed
Lines changed: 849 additions & 26 deletions
File tree
- mssql_python/pybind
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
43 | 56 | | |
44 | 57 | | |
45 | 58 | | |
| |||
1154 | 1167 | | |
1155 | 1168 | | |
1156 | 1169 | | |
1157 | | - | |
| 1170 | + | |
| 1171 | + | |
1158 | 1172 | | |
1159 | 1173 | | |
1160 | 1174 | | |
| |||
2876 | 2890 | | |
2877 | 2891 | | |
2878 | 2892 | | |
2879 | | - | |
| 2893 | + | |
| 2894 | + | |
2880 | 2895 | | |
2881 | 2896 | | |
2882 | | - | |
| 2897 | + | |
2883 | 2898 | | |
2884 | 2899 | | |
2885 | | - | |
| 2900 | + | |
2886 | 2901 | | |
2887 | 2902 | | |
2888 | 2903 | | |
2889 | | - | |
| 2904 | + | |
2890 | 2905 | | |
2891 | 2906 | | |
2892 | 2907 | | |
| |||
2942 | 2957 | | |
2943 | 2958 | | |
2944 | 2959 | | |
2945 | | - | |
| 2960 | + | |
| 2961 | + | |
| 2962 | + | |
| 2963 | + | |
| 2964 | + | |
| 2965 | + | |
| 2966 | + | |
| 2967 | + | |
| 2968 | + | |
| 2969 | + | |
| 2970 | + | |
| 2971 | + | |
| 2972 | + | |
| 2973 | + | |
| 2974 | + | |
| 2975 | + | |
| 2976 | + | |
2946 | 2977 | | |
2947 | 2978 | | |
2948 | 2979 | | |
| |||
2953 | 2984 | | |
2954 | 2985 | | |
2955 | 2986 | | |
2956 | | - | |
| 2987 | + | |
| 2988 | + | |
| 2989 | + | |
2957 | 2990 | | |
2958 | 2991 | | |
2959 | 2992 | | |
2960 | 2993 | | |
2961 | | - | |
| 2994 | + | |
2962 | 2995 | | |
2963 | 2996 | | |
2964 | 2997 | | |
2965 | | - | |
| 2998 | + | |
| 2999 | + | |
2966 | 3000 | | |
2967 | 3001 | | |
2968 | 3002 | | |
2969 | | - | |
| 3003 | + | |
2970 | 3004 | | |
2971 | 3005 | | |
2972 | 3006 | | |
| |||
3453 | 3487 | | |
3454 | 3488 | | |
3455 | 3489 | | |
| 3490 | + | |
| 3491 | + | |
| 3492 | + | |
| 3493 | + | |
| 3494 | + | |
| 3495 | + | |
3456 | 3496 | | |
| 3497 | + | |
3457 | 3498 | | |
3458 | 3499 | | |
3459 | 3500 | | |
| |||
3601 | 3642 | | |
3602 | 3643 | | |
3603 | 3644 | | |
3604 | | - | |
| 3645 | + | |
| 3646 | + | |
3605 | 3647 | | |
3606 | 3648 | | |
3607 | 3649 | | |
| |||
3631 | 3673 | | |
3632 | 3674 | | |
3633 | 3675 | | |
| 3676 | + | |
| 3677 | + | |
| 3678 | + | |
| 3679 | + | |
| 3680 | + | |
| 3681 | + | |
| 3682 | + | |
| 3683 | + | |
| 3684 | + | |
| 3685 | + | |
| 3686 | + | |
| 3687 | + | |
| 3688 | + | |
3634 | 3689 | | |
3635 | 3690 | | |
| 3691 | + | |
3636 | 3692 | | |
3637 | 3693 | | |
3638 | 3694 | | |
| |||
3642 | 3698 | | |
3643 | 3699 | | |
3644 | 3700 | | |
| 3701 | + | |
| 3702 | + | |
| 3703 | + | |
3645 | 3704 | | |
3646 | 3705 | | |
3647 | 3706 | | |
3648 | 3707 | | |
3649 | 3708 | | |
3650 | 3709 | | |
3651 | 3710 | | |
| 3711 | + | |
| 3712 | + | |
3652 | 3713 | | |
3653 | 3714 | | |
3654 | 3715 | | |
| |||
4094 | 4155 | | |
4095 | 4156 | | |
4096 | 4157 | | |
4097 | | - | |
| 4158 | + | |
| 4159 | + | |
4098 | 4160 | | |
4099 | 4161 | | |
4100 | 4162 | | |
| |||
4103 | 4165 | | |
4104 | 4166 | | |
4105 | 4167 | | |
4106 | | - | |
| 4168 | + | |
4107 | 4169 | | |
4108 | 4170 | | |
4109 | | - | |
| 4171 | + | |
4110 | 4172 | | |
4111 | 4173 | | |
4112 | 4174 | | |
| |||
4231 | 4293 | | |
4232 | 4294 | | |
4233 | 4295 | | |
4234 | | - | |
4235 | | - | |
| 4296 | + | |
| 4297 | + | |
4236 | 4298 | | |
4237 | 4299 | | |
4238 | 4300 | | |
| |||
4242 | 4304 | | |
4243 | 4305 | | |
4244 | 4306 | | |
4245 | | - | |
| 4307 | + | |
4246 | 4308 | | |
4247 | 4309 | | |
4248 | 4310 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
667 | 667 | | |
668 | 668 | | |
669 | 669 | | |
| 670 | + | |
| 671 | + | |
670 | 672 | | |
671 | 673 | | |
672 | 674 | | |
| |||
811 | 813 | | |
812 | 814 | | |
813 | 815 | | |
814 | | - | |
815 | | - | |
816 | | - | |
817 | | - | |
818 | | - | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
819 | 835 | | |
820 | | - | |
821 | | - | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
822 | 848 | | |
823 | 849 | | |
824 | 850 | | |
825 | 851 | | |
826 | 852 | | |
827 | | - | |
828 | | - | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
829 | 858 | | |
830 | 859 | | |
831 | 860 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5697 | 5697 | | |
5698 | 5698 | | |
5699 | 5699 | | |
| 5700 | + | |
5700 | 5701 | | |
5701 | 5702 | | |
5702 | 5703 | | |
| |||
0 commit comments