Why can't object names start with a number?
Why can't object names start with a number?
For example, if I'm creating view with a name '4aii'
, why does SQL Server care that it starts with a 4
? I could call the table Fouraii
or IVaii
.
'4aii'
4
Fouraii
IVaii
Additionally, what does do behind the scenes to allow for any string to be used as a name?
A string's a string, amirite?
4 Answers
4
A string's a string, amirite?
Yes and No: a string is a string, but object / item names are not strings. So while that statement is true, it is also not relevant to the behavior that you are seeing.
Ignoring the conceptual reasoning for the specific rules, the technical answer to "why one works and not the other" is that SQL Server follows (with minimal customization), the Unicode Standard's guidelines for identifiers. The Unicode documentation can be found here:
Unicode® Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX
Identifiers that are not enclosed in either [...]
or "..."
are "regular" identifiers, while those that are enclosed are "delimited" identifiers. Regular identifiers are names that are valid in all contexts (i.e. these are the rules for naming things in this language, software, etc). Delimited identifiers are everything else: names that are not valid and should not work, however, they are given an exemption if you wrap them in either of those delimiters. Most identifiers can be delimited; it is only GOTO
labels and variables (including table variables) / parameters that cannot be delimited. The distinction seems to be that identifiers that exist purely for use in the T-SQL language (i.e. not a name that will ever be stored in a data file or log file as meta-data) cannot be delimited (much as you would expect in any language).
[...]
"..."
GOTO
Now, the SQL Server documentation isn't exactly complete / correct, but it is correct about the classification of what is a valid "identifier" character (both starting and continuing) coming from the Unicode 3.2. If you want the actual list of rules for both regular and delimited identifiers, I have them documented here:
Completely Complete List of Rules for T-SQL Identifiers
To see the research that proves the relationship between the Unicode 3.2 categorizations and what SQL Server accepts for regular identifiers, please visit:
Addressing concerns noted in comments on this answer:
_
#
@
_
#
@
$
@Variable
#TempTable
Ident_Start
Ident_Continue
ALSO, with regards to the question as stated in the title, it depends on how loosely you define "number". Meaning, if you follow along with the research steps as shown in the two posts noted directly above, such that you have created a table to hold the Unicode Character Database v3.2 and a few additional properties, you can get a list of 52 non-letters (mostly "numbers") that are valid characters for starting an identifier via the following query:
SELECT ucd.*
FROM [v3-2].UnicodeCharacterDatabase ucd
WHERE ucd.[IDStart] = 1
AND ucd.[GeneralCategory] NOT LIKE 'L%';
Choosing a few of those characters to test, we can see that they do indeed work:
USE [tempdb];
CREATE TABLE dbo.Ⅳaii ([Col1] INT); -- ROMAN NUMERAL FOUR (U+2163)
CREATE TABLE dbo.ↂaii ([Col1] INT); -- ROMAN NUMERAL TEN THOUSAND (U+2182)
CREATE TABLE dbo.〤aii ([Col1] INT); -- HANGZHOU NUMERAL FOUR (U+3024)
And, just to show that they are "numbers" in more than just their names, the following query proves that they are assigned a numeric value (as shown in the NumericValue
column of the [v3-2].UnicodeCharacterDatabase
table:
NumericValue
[v3-2].UnicodeCharacterDatabase
SELECT 1 WHERE N'〤' LIKE N'[3-5]'; -- HANGZHOU NUMERAL FOUR (U+3024)
-- 1
However, they are not numbers that can be used in numeric operations:
SELECT 〤 + 0;
/*
Msg 207, Level 16, State 1, Line 23
Invalid column name '〤'.
*/
Regarding the issue of parsing and needing to be able to determine if 3e2
is a number or identifier: while this is a consideration, and possibly why numbers are excluded from the "Ident_start" Unicode general category, it is not a universal, and not necessarily why SQL Server excludes them. Three points to consider:
3e2
3e2
dbo.3e2
4aii
MySQL / MariaDB do not have this restriction. They allow for non-delimited identifiers such as 4aii
and 3e
, but not 3e2
or 300
. I was able to successfully execute the following in MySQL:
4aii
3e
3e2
300
create table 4aii (3e int);
So again, the reason that you cannot do this in SQL Server is because SQL Server adheres to the Unicode Standard's recommendation for identifiers. Why those characters were chosen by the Unicode Consortium is not specifically stated, but seems to be at least "best practice". Still, as proven with MySQL, it is possible to parse identifiers that start with a number.
Wouldn't it be correcter to say that SQL Server follows the ISO-9075 (SQL standard) rules for identifiers than that it follows the Unicode identifier rules? Or is this explicitly specified somewhere in the SQL Server documentation?
– Mark Rotteveel
Sep 13 '18 at 10:21
@MarkRotteveel Not sure about the ISO-9075 rules as I cannot find a definition of them. However, the SQL Server documentation does mention
A letter as defined by the Unicode Standard 3.2
, which admittedly is not the entire set of rules, but does point in that direction. And even if the ISO spec mentions these exact rules, the origin of the rules is still the Unicode Identifier document.– Solomon Rutzky
Sep 13 '18 at 15:54
A letter as defined by the Unicode Standard 3.2
I don't think this is true, in fact I think it's kind of tangential. Just look at the temporary table syntax
CREATE TABLE #foo ( a int )
in that, #foo
is an identifier that would require delimitation if they implemented the Unicode spec. They obviously use Unicode internally for their lexer, but I have no reason to believe their goal is compliance with anything. Also look at Other_ID_Start
none of those work in identifiers.– Evan Carroll
Sep 13 '18 at 16:44
CREATE TABLE #foo ( a int )
#foo
Other_ID_Start
@EvanCarroll The Unicode spec specifically talks about various customizations for various reasons. And in fact, the section mentioning it, 1.2 Customization even states: "Each programming language standard has its own identifier syntax; different programming languages have different conventions for the use of certain characters such as $, @, #, and _ in identifiers.". Those 4 example characters are the exact same 4 customizations used in SQL Server. So yes, very much true :-). Read my blog posts, the categorization is specific to Unicode 3.2.
– Solomon Rutzky
Sep 13 '18 at 16:56
Firstly, you need to distinguish between numbers (numeric literals), strings (string literals) and identifiers. '4aii'
is a string literal, which can be a value of some "thing", but it does not identify (name) a thing. 4aii
or [4aii]
would be identifiers (if it were allowed).
'4aii'
4aii
[4aii]
The query parser needs to understand the meaning of a token that it is looking at. By allowing names to begin with digits, you by extension allow them to consist of digits exclusively. Then, given select 12345 from mytable
, how would you (and the parser) know if 12345
is an integer literal or a name of a column?
select 12345 from mytable
12345
However, if you allow identifiers to begin with letters only (or underscore characters), you can unambiguously say if you're looking at an identifier (abc123
) or a string literal ('abc123'
) -- the latter is enclosed in quotation marks.
abc123
'abc123'
Square brackets in SQL Server, backticks (`) in MySQL, and double quotes in ANSI SQL-compliant engines, signify identifiers, and you use them for when your identifiers cannot be readily distinguished from other tokens: start with a digit, have spaces or other special characters in them etc. Hence, [4aii]
or "4aii"
clearly tell the parser that it's dealing with an identifier.
[4aii]
"4aii"
A little dbfiddle demo.
FYI: MySQL does allow for identifiers to start with a decimal digit. Please see my answer (towards the bottom) where I address this. It is valid to create a table and/or column as just
4aii
, without delimiters, in MySQL. But you cannot do 3e2
or 300
.– Solomon Rutzky
Sep 13 '18 at 19:05
4aii
3e2
300
@SolomonRutzky I'm not saying that it doesn't; I'm saying it has its own way to quote identifiers.
– mustaccio
Sep 13 '18 at 19:29
Also, in some DBMS
select [2]
can mean "give me the second column" what can add confusion to the poor parser– jean
Sep 13 '18 at 19:55
select [2]
@mustaccio Actually, to clarify: 1) "By allowing names to begin with digits, you by extension allow them to consist of digits exclusively.": potentially, but not necessarily. If this were the case, then MySQL wouldn't allow for non-delimited names starting with digits. 2) "if you allow identifiers to begin with letters only (or underscore characters), you can unambiguously say if you're looking at an identifier": there are 52 letter-like numbers (but still numbers, not letters) that are valid first characters. Unicode category "Nl" (again, per the v3.2 definitions).
– Solomon Rutzky
Sep 14 '18 at 7:12
What you're observing are the implementation's lexer rules. It's a part of a process called lexical analysis which is a fancy way of saying "making sense of things." Ideally, this would adhere to the rules given in the SQL Spec (<identifier>
). These rules are all published by Microsoft as the Rules for Regular Identifiers. If you wish to use irregular identifiers you have to quote them or "delimit" them from other tokens (Tsql's or double-quotes
""
) which eliminates any possibilities of ambiguous syntax.
<identifier>
""
A string's a string, amirite?
No, take for example this.
"No, take for example this."
That's one sentence. But, more importantly that's 5 words. You know it's five words because the whitespace is significant. You'll have to know it's five words if you're going to parse out the subjects, objects, and voice to make sense of it as an instruction.
A quick example,
3e2
Is that the string "3e2"? The number 300? A variable name? What if you meant the number, and forgot that you wrote 3e2 = 500
earlier in your script?
3e2 = 500
The rule is there so that a syntax parser can understand what you mean. There may be non-ambiguous examples like 4aii
as mentioned in your questions - but there is a subset of labels that are ambigous. So to avoid that ambiguity we have that rule.
4aii
Thanks for contributing an answer to Database Administrators Stack Exchange!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Who is this "amirite"?
– ypercubeᵀᴹ
Sep 13 '18 at 18:59