Professional Documents
Culture Documents
Services
SQL Server 2016 and later
2. When the table has been created, use the following statement to query the table:
sql
SELECT*from#MyData
Results
Col1
1
10
100
It should return the same values but with a new column name.
Notes
The @language parameter defines the language extension to call, in this case, R.
In the @script parameter, you define the commands to pass to the R runtime as Unicode text. You can also add the text to
a variable of type nvarchar and then call the variable.
The line N'OutputDataSet<InputDataSet;' passes the input data contained in the default variable name
InputDataSet, to R and then back to the results without any further operations. Note that R is casesensitive; therefore,
both the input and output variable names must use the correct casing or an error will be raised.
To specify a different input or output variable, use the @input_data_1_name parameter, and type a valid SQL identifier. For
example, in this example the names of the output and input variables have been changed to SQLOut and SQLIn respectively:
sql
executesp_execute_external_script
@language=N'R'
,@script=N'SQLOut<SQLIn;'
,@input_data_1=N'SELECT12asCol;'
,@input_data_1_name=N'SQLIn'
,@output_data_1_name=N'SQLOut'
WITHRESULTSETS(([NewColName]intNOTNULL));
Notes
The required parameters @input_data_1 and @output_data_1 must be specified first, in order to use the optional
parameters @input_data_1_name and @output_data_1_name.
SQL and R do not support the same data types; therefore, type conversions very often take place when sending data from
SQL Server to R and vice versa. For more information, see Working with R Data Types.
Only one input dataset can be passed as a parameter, and you can return only one dataset. However, you can call other
datasets from inside your R code and you can return outputs of other types in addition to the dataset. You can also add
the OUTPUT keyword to any parameter to have it returned with the results.
The schema for the returned dataset R data.frame is defined by the WITHRESULTSETS statement. Try omitting this and
see what happens.
Tabular results are returned in the Values pane. Messages returned by the R runtime are provided in the Messages
Results
col
hello
col
world
Now, try a different version of the Hello World sample provided above.
sql
executesp_execute_external_script
@language=N'R'
,@script=N'OutputDataSet<data.frame(c("hello"),"",c("world"));'
,@input_data_1=N''
WITHRESULTSETS(([col1]varchar(20),[col2]char(1),[col3]varchar(20)));
Results
col1
col2
hello
col3
world
Note that both statements create a vector with three values, but the second example returns three columns with a single row,
and the first returns a single column with three rows. Why?
The reason is that R provides many ways to work with columns of values: vectors, matrices, arrays, and lists. These operations,
while powerful and flexible, do not always conform to the expectations of SQL developers. Some R functions will perform implicit
data object conversions on lists and matrices.
Tip
Always verify your results and determine how many columns of data your R code will return, and what the data types will be.
Regardless of whether your R code uses matrices, vectors, or some other data structure, remember that the result that is
output from the R script to the stored procedure must be a data.frame.
R and SQL Server don't use the same data types, so you must be aware of the restrictions when you move data between R and
the database. the following examples illustrate some common issues.
1. Run the following statement to perform matrix multiplication using R. In this script, the single column of three values is
converted to a singlecolumn matrix. Then, R implicitly coerces the second variable, y, to a singlecolumn matrix to make
the two arguments conform.
sql
executesp_execute_external_script
@language=N'R'
,@script=N'
x<as.matrix(InputDataSet);
y<array(12:15);
OutputDataSet<as.data.frame(x%*%y);'
,@input_data_1=N'SELECT[Col1]from#MyData;'
WITHRESULTSETS(([Col1]int,[Col2]int,[Col3]int,Col4int));
Results
Col1
Col2
Col3
Col4
12
13
14
15
120
130
140
150
1200
1300
1400
1500
2. Now run the next script, which is similar, and see what happens when you change the length of the array.
sql
executesp_execute_external_script
@language=N'R'
,@script=N'
x<as.matrix(InputDataSet);
y<array(12:14);
OutputDataSet<as.data.frame(y%*%x);'
,@input_data_1=N'SELECT[Col1]from#MyData;'
WITHRESULTSETS(([Col1]int));
Results
Col1
1542
This time R returns a single value as the result. This result is valid because the two arguments are vectors of the same length;
therefore, R will return the inner product as a matrix.
Results
Col2
Col3
10
100
10
100
There are many functions in R that create tabular output but perform quite different operations on the values depending on the
R data object. Because this TransactSQL stored procedure requires that both inputs and outputs be passed as a data.frame, you
will frequently be using functions to convert columns and rows to and from data frames.
If you ever have any doubt as to which R data object is being used, add the R str() function or one of the identify functions
is.matrix, is.vector, etc. to inspect the results and get the actual schema and value types.
For more information, see this article by Hadley Wickham on R Data Structures.
You can use the function str() in your R script to have the data schema of the R object returned as an informational message in
.
For example, the following statement returns the schema of the #MyData table.
sql
executesp_execute_external_script
@language=N'R'
,@script=N'str(InputDataSet);'
,@input_data_1=N'SELECT*FROM#MyData;'
WITHRESULTSETSundefined;
Results
STDOUT message (s) from external script:
'data.frame': 3 obs. of 1 variable:
STDOUT message (s) from external script:
$ Col1: int 1 10 100
2. Now review the results of the str function to see how R handled the input data.
Results
We recommend that you use RODBC to get smaller datasets, such as lookups or lists of factors, and use the @input_data
parameter to get larger datasets, such as those used for training a model, from SQL Server.
This statement calls the function from TSQL and outputs the results to SQL Server.
sql
EXECsp_execute_external_script
@language=N'R'
,@script=N'
OutputDataSet<as.data.frame(rnorm(20,mean=100));'
,@input_data_1=N';'
WITHRESULTSETS(([Density]floatNOTNULL));
Next, you wrap the stored procedure in another stored procedure to make it easier to pass in parameters. You must define each
of the input parameters in the @params argument, and map each parameter to its corresponding R parameter by name.
sql
CREATEPROCEDUREMyRNorm(@mynormint,@mymeanint)
AS
EXECsp_execute_external_script
@language=N'R'
,@script=N'
OutputDataSet<as.data.frame(rnorm(mynorm,mymean));'
,@input_data_1=N';'
,@params=N'@mynormint,@mymeanint'
,@mynorm=@mynorm
,@mymean=@mymean
WITHRESULTSETS(([Density]floatNOTNULL));
The next example gets the maximum length of integers that are supported on the current computer, using the R .Machine
function, and outputs it to the console.
R
localmax<.Machine$integer.max;
localmax;
sql
executesp_execute_external_script
@language=N'R'
,@script=N'
localmax<.Machine$integer.max;
OutputDataSet<as.data.frame(localmax);'
,@input_data_1=N'select[Col1]from#MyData;'
WITHRESULTSETS(([MaxIntValue]intnotnull));
However, it isn't always the case that R will do the job better. Setbased operations in SQL Server might be far more efficient for
some operations that data scientists would traditionally perform in R. For an example of a performance comparison of R
functions and TSQL custom functions, see the Data Science EndtoEnd solution.
We recommend that you evaluate on a casebycase basis whether it makes more sense to perform a given operation using R,
using TSQL, or some other tool.
Additional Resources
Data Science Deep Dive: Using the RevoScaleR packages: This walkthrough provides handson experience with common data
science tasks
Data Science EndtoEnd solution: This walkthrough illustrates a development and deployment process that balances SQL and R
approaches
Advanced analytics for the SQL Developer: Illustrates the complete model operationalization the SQL Developer
2016 Microsoft