Skip to content

Commit cf7da77

Browse files
feat: add -f codepage flag for input/output encoding
1 parent ca107b8 commit cf7da77

11 files changed

Lines changed: 1318 additions & 5 deletions

File tree

README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ The following switches have different behavior in this version of `sqlcmd` compa
151151
- To provide the value of the host name in the server certificate when using strict encryption, pass the host name with `-F`. Example: `-Ns -F myhost.domain.com`
152152
- More information about client/server encryption negotiation can be found at <https://docs.microsoft.com/openspecs/windows_protocols/ms-tds/60f56408-0188-4cd5-8b90-25c6f2423868>
153153
- `-u` The generated Unicode output file will have the UTF16 Little-Endian Byte-order mark (BOM) written to it.
154+
- `-f` Specifies the code page for input and output files. See [Code Page Support](#code-page-support) below for details and examples.
154155
- Some behaviors that were kept to maintain compatibility with `OSQL` may be changed, such as alignment of column headers for some data types.
155156
- All commands must fit on one line, even `EXIT`. Interactive mode will not check for open parentheses or quotes for commands and prompt for successive lines. The ODBC sqlcmd allows the query run by `EXIT(query)` to span multiple lines.
156157
- `-i` doesn't handle a comma `,` in a file name correctly unless the file name argument is triple quoted. For example:
@@ -255,6 +256,79 @@ To see a list of available styles along with colored syntax samples, use this co
255256
:list color
256257
```
257258

259+
### Code Page Support
260+
261+
The `-f` flag specifies the code page for reading input files and writing output. This is useful when working with SQL scripts saved in legacy encodings or when output needs to be in a specific encoding.
262+
263+
#### Format
264+
265+
```
266+
-f codepage # Set both input and output to the same codepage
267+
-f i:codepage # Set input codepage only
268+
-f o:codepage # Set output codepage only
269+
-f i:codepage,o:codepage # Set input and output to different codepages
270+
-f o:codepage,i:codepage # Same as above (order doesn't matter)
271+
```
272+
273+
#### Common Code Pages
274+
275+
| Code Page | Name | Description |
276+
|-----------|------|-------------|
277+
| 65001 | UTF-8 | Unicode (UTF-8) - default for most modern systems |
278+
| 1200 | UTF-16LE | Unicode (UTF-16 Little-Endian) |
279+
| 1201 | UTF-16BE | Unicode (UTF-16 Big-Endian) |
280+
| 1252 | Windows-1252 | Western European (Windows) |
281+
| 932 | Shift_JIS | Japanese |
282+
| 936 | GBK | Chinese Simplified |
283+
| 949 | EUC-KR | Korean |
284+
| 950 | Big5 | Chinese Traditional |
285+
| 437 | CP437 | OEM United States (DOS) |
286+
287+
#### Examples
288+
289+
**Run a script saved in Windows-1252 encoding:**
290+
```bash
291+
sqlcmd -S myserver -i legacy_script.sql -f 1252
292+
```
293+
294+
**Read UTF-16 input file and write UTF-8 output:**
295+
```bash
296+
sqlcmd -S myserver -i unicode_script.sql -o results.txt -f i:1200,o:65001
297+
```
298+
299+
**Process a Japanese Shift-JIS encoded script:**
300+
```bash
301+
sqlcmd -S myserver -i japanese_data.sql -f 932
302+
```
303+
304+
**Write output in Windows-1252 for legacy applications:**
305+
```bash
306+
sqlcmd -S myserver -Q "SELECT * FROM Products" -o report.txt -f o:1252
307+
```
308+
309+
**List all supported code pages:**
310+
```bash
311+
sqlcmd --list-codepages
312+
```
313+
314+
#### Notes
315+
316+
- When no `-f` flag is specified, sqlcmd auto-detects UTF-8/UTF-16LE/UTF-16BE BOM (Byte Order Mark) in input files and switches to the appropriate decoder. If no BOM is present, UTF-8 is assumed.
317+
- UTF-8 input files with BOM are handled automatically.
318+
- On Windows, additional codepages installed on the system are available via the Windows API, even if not shown by `--list-codepages`.
319+
- Use `--list-codepages` to see the built-in code pages with their names and descriptions.
320+
321+
#### Differences from ODBC sqlcmd
322+
323+
| Aspect | ODBC sqlcmd | go-sqlcmd |
324+
|--------|-------------|-----------|
325+
| **Default encoding (no BOM, no `-f`)** | Windows ANSI code page (locale-dependent, e.g., 1252) | UTF-8 |
326+
| **UTF-16 codepages (1200, 1201)** | Rejected by `IsValidCodePage()` API | Accepted |
327+
| **BOM detection** | Yes (UTF-8, UTF-16 LE/BE) | Yes (identical behavior) |
328+
| **`--list-codepages`** | Not available | Available |
329+
330+
**Migration note**: If you have UTF-8 encoded SQL scripts without a BOM that worked with ODBC sqlcmd on Windows, they should work identically or better with go-sqlcmd since go-sqlcmd defaults to UTF-8. However, if you have scripts in Windows ANSI encoding (e.g., Windows-1252) without a BOM, you may need to explicitly specify `-f 1252` with go-sqlcmd.
331+
258332
### Packages
259333

260334
#### sqlcmd executable

cmd/sqlcmd/sqlcmd.go

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,9 @@ type SQLCmdArguments struct {
8282
ChangePassword string
8383
ChangePasswordAndExit string
8484
TraceFile string
85+
CodePage string
86+
codePageSettings *sqlcmd.CodePageSettings
87+
ListCodePages bool
8588
// Keep Help at the end of the list
8689
Help bool
8790
}
@@ -171,6 +174,12 @@ func (a *SQLCmdArguments) Validate(c *cobra.Command) (err error) {
171174
err = rangeParameterError("-t", fmt.Sprint(a.QueryTimeout), 0, 65534, true)
172175
case a.ServerCertificate != "" && !encryptConnectionAllowsTLS(a.EncryptConnection):
173176
err = localizer.Errorf("The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict).")
177+
case a.CodePage != "":
178+
if codePageSettings, parseErr := sqlcmd.ParseCodePage(a.CodePage); parseErr != nil {
179+
err = localizer.Errorf(`'-f %s': %v`, a.CodePage, parseErr)
180+
} else {
181+
a.codePageSettings = codePageSettings
182+
}
174183
}
175184
}
176185
if err != nil {
@@ -239,6 +248,17 @@ func Execute(version string) {
239248
listLocalServers()
240249
os.Exit(0)
241250
}
251+
// List supported codepages
252+
if args.ListCodePages {
253+
fmt.Println(localizer.Sprintf("Supported Code Pages:"))
254+
fmt.Println()
255+
fmt.Printf("%-8s %-20s %s\n", "Code", "Name", "Description")
256+
fmt.Printf("%-8s %-20s %s\n", "----", "----", "-----------")
257+
for _, cp := range sqlcmd.SupportedCodePages() {
258+
fmt.Printf("%-8d %-20s %s\n", cp.CodePage, cp.Name, cp.Description)
259+
}
260+
os.Exit(0)
261+
}
242262
if len(argss) > 0 {
243263
fmt.Printf("%s'%s': Unknown command. Enter '--help' for command help.", sqlcmdErrorPrefix, argss[0])
244264
os.Exit(1)
@@ -479,6 +499,8 @@ func setFlags(rootCmd *cobra.Command, args *SQLCmdArguments) {
479499
rootCmd.Flags().BoolVarP(&args.EnableColumnEncryption, "enable-column-encryption", "g", false, localizer.Sprintf("Enable column encryption"))
480500
rootCmd.Flags().StringVarP(&args.ChangePassword, "change-password", "z", "", localizer.Sprintf("New password"))
481501
rootCmd.Flags().StringVarP(&args.ChangePasswordAndExit, "change-password-exit", "Z", "", localizer.Sprintf("New password and exit"))
502+
rootCmd.Flags().StringVarP(&args.CodePage, "code-page", "f", "", localizer.Sprintf("Specifies the code page for input/output. Use 65001 for UTF-8. Format: codepage | i:codepage[,o:codepage] | o:codepage[,i:codepage]"))
503+
rootCmd.Flags().BoolVar(&args.ListCodePages, "list-codepages", false, localizer.Sprintf("List supported code pages and exit"))
482504
}
483505

484506
func setScriptVariable(v string) string {
@@ -817,6 +839,11 @@ func run(vars *sqlcmd.Variables, args *SQLCmdArguments) (int, error) {
817839
defer s.StopCloseHandler()
818840
s.UnicodeOutputFile = args.UnicodeOutputFile
819841

842+
// Apply codepage settings (already parsed and validated in Validate)
843+
if args.codePageSettings != nil {
844+
s.CodePage = args.codePageSettings
845+
}
846+
820847
if args.DisableCmd != nil {
821848
s.Cmd.DisableSysCommands(args.errorOnBlockedCmd())
822849
}

cmd/sqlcmd/sqlcmd_test.go

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,22 @@ func TestValidCommandLineToArgsConversion(t *testing.T) {
123123
{[]string{"-N", "true", "-J", "/path/to/cert2.pem"}, func(args SQLCmdArguments) bool {
124124
return args.EncryptConnection == "true" && args.ServerCertificate == "/path/to/cert2.pem"
125125
}},
126+
// Codepage flag tests
127+
{[]string{"-f", "65001"}, func(args SQLCmdArguments) bool {
128+
return args.CodePage == "65001"
129+
}},
130+
{[]string{"-f", "i:1252,o:65001"}, func(args SQLCmdArguments) bool {
131+
return args.CodePage == "i:1252,o:65001"
132+
}},
133+
{[]string{"-f", "o:65001,i:1252"}, func(args SQLCmdArguments) bool {
134+
return args.CodePage == "o:65001,i:1252"
135+
}},
136+
{[]string{"--code-page", "1252"}, func(args SQLCmdArguments) bool {
137+
return args.CodePage == "1252"
138+
}},
139+
{[]string{"--list-codepages"}, func(args SQLCmdArguments) bool {
140+
return args.ListCodePages
141+
}},
126142
}
127143

128144
for _, test := range commands {
@@ -178,6 +194,11 @@ func TestInvalidCommandLine(t *testing.T) {
178194
{[]string{"-N", "optional", "-J", "/path/to/cert.pem"}, "The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict)."},
179195
{[]string{"-N", "disable", "-J", "/path/to/cert.pem"}, "The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict)."},
180196
{[]string{"-N", "strict", "-F", "myserver.domain.com", "-J", "/path/to/cert.pem"}, "The -F and the -J options are mutually exclusive."},
197+
// Codepage validation tests
198+
{[]string{"-f", "invalid"}, `'-f invalid': invalid codepage: invalid`},
199+
{[]string{"-f", "99999"}, `'-f 99999': unsupported codepage 99999`},
200+
{[]string{"-f", "i:invalid"}, `'-f i:invalid': invalid input codepage: i:invalid`},
201+
{[]string{"-f", "x:1252"}, `'-f x:1252': invalid codepage: x:1252`},
181202
}
182203

183204
for _, test := range commands {

pkg/sqlcmd/codepage.go

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
// Copyright (c) Microsoft Corporation.
2+
// Licensed under the MIT license.
3+
4+
package sqlcmd
5+
6+
import (
7+
"sort"
8+
"strconv"
9+
"strings"
10+
11+
"github.com/microsoft/go-sqlcmd/internal/localizer"
12+
"golang.org/x/text/encoding"
13+
"golang.org/x/text/encoding/charmap"
14+
"golang.org/x/text/encoding/japanese"
15+
"golang.org/x/text/encoding/korean"
16+
"golang.org/x/text/encoding/simplifiedchinese"
17+
"golang.org/x/text/encoding/traditionalchinese"
18+
"golang.org/x/text/encoding/unicode"
19+
)
20+
21+
// codepageEntry defines a codepage with its encoding and metadata
22+
type codepageEntry struct {
23+
encoding encoding.Encoding // nil for UTF-8 (Go's native encoding)
24+
name string
25+
description string
26+
}
27+
28+
// codepageRegistry is the single source of truth for all supported codepages
29+
// that work cross-platform. Both GetEncoding and SupportedCodePages use this
30+
// registry. On Windows, additional codepages installed on the system are also
31+
// available via the Windows API fallback in GetEncoding.
32+
var codepageRegistry = map[int]codepageEntry{
33+
// Unicode
34+
65001: {nil, "UTF-8", "Unicode (UTF-8)"},
35+
1200: {unicode.UTF16(unicode.LittleEndian, unicode.UseBOM), "UTF-16LE", "Unicode (UTF-16 Little-Endian)"},
36+
1201: {unicode.UTF16(unicode.BigEndian, unicode.UseBOM), "UTF-16BE", "Unicode (UTF-16 Big-Endian)"},
37+
38+
// OEM/DOS codepages
39+
437: {charmap.CodePage437, "CP437", "OEM United States"},
40+
850: {charmap.CodePage850, "CP850", "OEM Multilingual Latin 1"},
41+
852: {charmap.CodePage852, "CP852", "OEM Latin 2"},
42+
855: {charmap.CodePage855, "CP855", "OEM Cyrillic"},
43+
858: {charmap.CodePage858, "CP858", "OEM Multilingual Latin 1 + Euro"},
44+
860: {charmap.CodePage860, "CP860", "OEM Portuguese"},
45+
862: {charmap.CodePage862, "CP862", "OEM Hebrew"},
46+
863: {charmap.CodePage863, "CP863", "OEM Canadian French"},
47+
865: {charmap.CodePage865, "CP865", "OEM Nordic"},
48+
866: {charmap.CodePage866, "CP866", "OEM Russian"},
49+
50+
// Windows codepages
51+
874: {charmap.Windows874, "Windows-874", "Thai"},
52+
1250: {charmap.Windows1250, "Windows-1250", "Central European"},
53+
1251: {charmap.Windows1251, "Windows-1251", "Cyrillic"},
54+
1252: {charmap.Windows1252, "Windows-1252", "Western European"},
55+
1253: {charmap.Windows1253, "Windows-1253", "Greek"},
56+
1254: {charmap.Windows1254, "Windows-1254", "Turkish"},
57+
1255: {charmap.Windows1255, "Windows-1255", "Hebrew"},
58+
1256: {charmap.Windows1256, "Windows-1256", "Arabic"},
59+
1257: {charmap.Windows1257, "Windows-1257", "Baltic"},
60+
1258: {charmap.Windows1258, "Windows-1258", "Vietnamese"},
61+
62+
// ISO-8859 codepages
63+
28591: {charmap.ISO8859_1, "ISO-8859-1", "Latin 1 (Western European)"},
64+
28592: {charmap.ISO8859_2, "ISO-8859-2", "Latin 2 (Central European)"},
65+
28593: {charmap.ISO8859_3, "ISO-8859-3", "Latin 3 (South European)"},
66+
28594: {charmap.ISO8859_4, "ISO-8859-4", "Latin 4 (North European)"},
67+
28595: {charmap.ISO8859_5, "ISO-8859-5", "Cyrillic"},
68+
28596: {charmap.ISO8859_6, "ISO-8859-6", "Arabic"},
69+
28597: {charmap.ISO8859_7, "ISO-8859-7", "Greek"},
70+
28598: {charmap.ISO8859_8, "ISO-8859-8", "Hebrew"},
71+
28599: {charmap.ISO8859_9, "ISO-8859-9", "Turkish"},
72+
28600: {charmap.ISO8859_10, "ISO-8859-10", "Nordic"},
73+
28603: {charmap.ISO8859_13, "ISO-8859-13", "Baltic"},
74+
28604: {charmap.ISO8859_14, "ISO-8859-14", "Celtic"},
75+
28605: {charmap.ISO8859_15, "ISO-8859-15", "Latin 9 (Western European with Euro)"},
76+
28606: {charmap.ISO8859_16, "ISO-8859-16", "Latin 10 (South-Eastern European)"},
77+
78+
// Cyrillic
79+
20866: {charmap.KOI8R, "KOI8-R", "Russian"},
80+
21866: {charmap.KOI8U, "KOI8-U", "Ukrainian"},
81+
82+
// Macintosh
83+
10000: {charmap.Macintosh, "Macintosh", "Mac Roman"},
84+
10007: {charmap.MacintoshCyrillic, "x-mac-cyrillic", "Mac Cyrillic"},
85+
86+
// EBCDIC
87+
37: {charmap.CodePage037, "IBM037", "EBCDIC US-Canada"},
88+
1047: {charmap.CodePage1047, "IBM1047", "EBCDIC Latin 1/Open System"},
89+
1140: {charmap.CodePage1140, "IBM01140", "EBCDIC US-Canada with Euro"},
90+
91+
// Japanese
92+
932: {japanese.ShiftJIS, "Shift_JIS", "Japanese (Shift-JIS)"},
93+
20932: {japanese.EUCJP, "EUC-JP", "Japanese (EUC)"},
94+
50220: {japanese.ISO2022JP, "ISO-2022-JP", "Japanese (JIS)"},
95+
50221: {japanese.ISO2022JP, "csISO2022JP", "Japanese (JIS-Allow 1 byte Kana)"},
96+
50222: {japanese.ISO2022JP, "ISO-2022-JP", "Japanese (JIS-Allow 1 byte Kana SO/SI)"},
97+
98+
// Korean
99+
949: {korean.EUCKR, "EUC-KR", "Korean"},
100+
51949: {korean.EUCKR, "EUC-KR", "Korean (EUC)"},
101+
102+
// Simplified Chinese
103+
936: {simplifiedchinese.GBK, "GBK", "Chinese Simplified (GBK)"},
104+
54936: {simplifiedchinese.GB18030, "GB18030", "Chinese Simplified (GB18030)"},
105+
52936: {simplifiedchinese.HZGB2312, "HZ-GB-2312", "Chinese Simplified (HZ)"},
106+
107+
// Traditional Chinese
108+
950: {traditionalchinese.Big5, "Big5", "Chinese Traditional (Big5)"},
109+
}
110+
111+
// CodePageSettings holds the input and output codepage settings
112+
type CodePageSettings struct {
113+
InputCodePage int
114+
OutputCodePage int
115+
}
116+
117+
// ParseCodePage parses the -f codepage argument
118+
// Format: codepage | i:codepage[,o:codepage] | o:codepage[,i:codepage]
119+
func ParseCodePage(arg string) (*CodePageSettings, error) {
120+
if arg == "" {
121+
return nil, nil
122+
}
123+
124+
settings := &CodePageSettings{}
125+
parts := strings.Split(arg, ",")
126+
127+
for _, part := range parts {
128+
part = strings.TrimSpace(part)
129+
if part == "" {
130+
continue
131+
}
132+
133+
if strings.HasPrefix(strings.ToLower(part), "i:") {
134+
// Input codepage
135+
cp, err := strconv.Atoi(strings.TrimPrefix(strings.ToLower(part), "i:"))
136+
if err != nil {
137+
return nil, localizer.Errorf("invalid input codepage: %s", part)
138+
}
139+
settings.InputCodePage = cp
140+
} else if strings.HasPrefix(strings.ToLower(part), "o:") {
141+
// Output codepage
142+
cp, err := strconv.Atoi(strings.TrimPrefix(strings.ToLower(part), "o:"))
143+
if err != nil {
144+
return nil, localizer.Errorf("invalid output codepage: %s", part)
145+
}
146+
settings.OutputCodePage = cp
147+
} else {
148+
// Both input and output
149+
cp, err := strconv.Atoi(part)
150+
if err != nil {
151+
return nil, localizer.Errorf("invalid codepage: %s", part)
152+
}
153+
settings.InputCodePage = cp
154+
settings.OutputCodePage = cp
155+
}
156+
}
157+
158+
// If a non-empty argument was provided but no codepage was parsed,
159+
// treat this as an error rather than silently disabling codepage handling.
160+
if settings.InputCodePage == 0 && settings.OutputCodePage == 0 {
161+
return nil, localizer.Errorf("invalid codepage: %s", arg)
162+
}
163+
164+
// Validate codepages
165+
if settings.InputCodePage != 0 {
166+
if _, err := GetEncoding(settings.InputCodePage); err != nil {
167+
return nil, err
168+
}
169+
}
170+
if settings.OutputCodePage != 0 {
171+
if _, err := GetEncoding(settings.OutputCodePage); err != nil {
172+
return nil, err
173+
}
174+
}
175+
176+
return settings, nil
177+
}
178+
179+
// GetEncoding returns the encoding for a given Windows codepage number.
180+
// Returns nil for UTF-8 (65001) since Go uses UTF-8 natively.
181+
// If the codepage is not in the built-in registry, falls back to
182+
// OS-specific support (Windows API on Windows, error on other platforms).
183+
func GetEncoding(codepage int) (encoding.Encoding, error) {
184+
entry, ok := codepageRegistry[codepage]
185+
if !ok {
186+
// Fallback to system-provided codepage support
187+
return getSystemCodePageEncoding(codepage)
188+
}
189+
return entry.encoding, nil
190+
}
191+
192+
// CodePageInfo describes a supported codepage
193+
type CodePageInfo struct {
194+
CodePage int
195+
Name string
196+
Description string
197+
}
198+
199+
// SupportedCodePages returns a list of all supported codepages with descriptions
200+
func SupportedCodePages() []CodePageInfo {
201+
result := make([]CodePageInfo, 0, len(codepageRegistry))
202+
for cp, entry := range codepageRegistry {
203+
result = append(result, CodePageInfo{
204+
CodePage: cp,
205+
Name: entry.name,
206+
Description: entry.description,
207+
})
208+
}
209+
// Sort by codepage number for consistent output
210+
sort.Slice(result, func(i, j int) bool {
211+
return result[i].CodePage < result[j].CodePage
212+
})
213+
return result
214+
}

0 commit comments

Comments
 (0)