As a continuation of SQL in R, we wade into into the final and conclusive part after we saw glimpses of this topic in Part 1. For starters, let us begin with the group by clause, two examples of which are shown below:
> sqldf("select am,sum(wt) from mtcars group by am")
am sum(wt)
1 0 71.609
2 1 31.343
> sqldf("select carb,avg(wt) from mtcars group by carb")
carb avg(wt)
1 1 2.4900
2 2 2.8628
3 3 3.8600
4 4 3.8974
5 6 2.7700
6 8 3.5700
Group by clause usage is shown in the next query:
> sqldf("select carb,avg(wt) from mtcars group by carb having avg(wt) >= 3.0")
carb avg(wt)
1 3 3.8600
2 4 3.8974
3 8 3.5700
The next query shows a subquery:
> sqldf("select * from mtcars where disp = (select max(disp) from mtcars)")
mpg cyl disp hp drat wt qsec vs am gear carb
1 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
Set operations like union can be performed and an example is shown below:
> sqldf("select * from mtcars where disp = (select max(disp) from mtcars) union select * from mtcars where disp = (select min(disp) from mtcars)", row.names = TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
The last query gives the details of the vehicle having the maximum and minimum displacement in cubic inches.
The next query shows a correlated subquery:
> sqldf("select * from employee_data as e1 where salary > (select avg(salary) from employee_data where department_id = e1.department_id)")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
3 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
4 108 Nancy Greenberg NGREENBE 515.124.4569 2002-08-17 FI_MGR 12008
5 109 Daniel Faviet DFAVIET 515.124.4169 2002-08-16 FI_ACCOUNT 9000
6 114 Den Raphaely DRAPHEAL 515.127.4561 2002-12-07 PU_MAN 11000
7 120 Matthew Weiss MWEISS 650.123.1234 2004-07-18 ST_MAN 8000
8 121 Adam Fripp AFRIPP 650.123.2234 2005-04-10 ST_MAN 8200
9 122 Payam Kaufling PKAUFLIN 650.123.3234 2003-05-01 ST_MAN 7900
10 123 Shanta Vollman SVOLLMAN 650.123.4234 2005-10-10 ST_MAN 6500
11 124 Kevin Mourgos KMOURGOS 650.123.5234 2007-11-16 ST_MAN 5800
12 137 Renske Ladwig RLADWIG 650.121.1234 2003-07-14 ST_CLERK 3600
13 141 Trenna Rajs TRAJS 650.121.8009 2003-10-17 ST_CLERK 3500
14 145 John Russell JRUSSEL 011.44.1344.429268 2004-10-01 SA_MAN 14000
15 146 Karen Partners KPARTNER 011.44.1344.467268 2005-01-05 SA_MAN 13500
16 147 Alberto Errazuriz AERRAZUR 011.44.1344.429278 2005-03-10 SA_MAN 12000
17 148 Gerald Cambrault GCAMBRAU 011.44.1344.619268 2007-10-15 SA_MAN 11000
18 149 Eleni Zlotkey EZLOTKEY 011.44.1344.429018 2008-01-29 SA_MAN 10500
19 150 Peter Tucker PTUCKER 011.44.1344.129268 2005-01-30 SA_REP 10000
20 151 David Bernstein DBERNSTE 011.44.1344.345268 2005-03-24 SA_REP 9500
21 152 Peter Hall PHALL 011.44.1344.478968 2005-08-20 SA_REP 9000
22 156 Janette King JKING 011.44.1345.429268 2004-01-30 SA_REP 10000
23 157 Patrick Sully PSULLY 011.44.1345.929268 2004-03-04 SA_REP 9500
24 158 Allan McEwen AMCEWEN 011.44.1345.829268 2004-08-01 SA_REP 9000
25 162 Clara Vishney CVISHNEY 011.44.1346.129268 2005-11-11 SA_REP 10500
26 163 Danielle Greene DGREENE 011.44.1346.229268 2007-03-19 SA_REP 9500
27 168 Lisa Ozer LOZER 011.44.1343.929268 2005-03-11 SA_REP 11500
28 169 Harrison Bloom HBLOOM 011.44.1343.829268 2006-03-23 SA_REP 10000
29 170 Tayler Fox TFOX 011.44.1343.729268 2006-01-24 SA_REP 9600
30 174 Ellen Abel EABEL 011.44.1644.429267 2004-05-11 SA_REP 11000
31 184 Nandita Sarchand NSARCHAN 650.509.1876 2004-01-27 SH_CLERK 4200
32 185 Alexis Bull ABULL 650.509.2876 2005-02-20 SH_CLERK 4100
33 188 Kelly Chung KCHUNG 650.505.1876 2005-06-14 SH_CLERK 3800
34 189 Jennifer Dilly JDILLY 650.505.2876 2005-08-13 SH_CLERK 3600
35 192 Sarah Bell SBELL 650.501.1876 2004-02-04 SH_CLERK 4000
36 193 Britney Everett BEVERETT 650.501.2876 2005-03-03 SH_CLERK 3900
37 201 Michael Hartstein MHARTSTE 515.123.5555 2004-02-17 MK_MAN 13000
38 205 Shelley Higgins SHIGGINS 515.123.8080 2002-06-07 AC_MGR 12008
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
1 NA NA 90
2 NA 102 60
3 NA 103 60
4 NA 101 100
5 NA 108 100
6 NA 100 30
7 NA 100 50
8 NA 100 50
9 NA 100 50
10 NA 100 50
11 NA 100 50
12 NA 123 50
13 NA 124 50
14 0.40 100 80
15 0.30 100 80
16 0.30 100 80
17 0.30 100 80
18 0.20 100 80
19 0.30 145 80
20 0.25 145 80
21 0.25 145 80
22 0.35 146 80
23 0.35 146 80
24 0.35 146 80
25 0.25 147 80
26 0.15 147 80
27 0.25 148 80
28 0.20 148 80
29 0.20 148 80
30 0.30 149 80
31 NA 121 50
32 NA 121 50
33 NA 122 50
34 NA 122 50
35 NA 123 50
36 NA 123 50
37 NA 100 20
38 NA 101 110
The above query returns those employees that have salaries above the average salaries in their respective departments. In the next query we see how we can create a temporary table and use that temporary table for querying.
> sqldf(c("create temp table temp_1 as select * from mtcars where qsec > 17.0", "select gear,avg(disp) from temp_1 group by gear"))
gear avg(disp)
1 3 322.2083
2 4 119.6545
In the above query, a temporary table called temp_1 that only exists for querying in this command . The next query uses temp_1 to calculate the average of displacement by gear.
To illustrate update feature, we a dataframe called update_sample and update the missing values in second column from the values in first column.
> update_sample <- data.frame(a = 1:5, b = c(3, NA, 5, NA, 7))
> update_sample
a b
1 1 3
2 2 NA
3 3 5
4 4 NA
5 5 7
> sqldf(c("update update_sample set b = a where b is null", "select * from update_sample"))
a b
1 1 3
2 2 2
3 3 5
4 4 4
5 5 7
It is to be remembered that update_sample is unchanged and the updated record set is only to be used in the second query as shown above. This behavior is the same for the insert and delete statements. The insert as shown below inserts a new row with values 15 and NA:
> sqldf(c("insert into update_sample (a,b) values (15,NULL)", "select * from update_sample"))
a b
1 1 3
2 2 NA
3 3 5
4 4 NA
5 5 7
6 15 NA
The delete keyword deletes a row that has a equals 4 as shown below:
> sqldf(c("delete from update_sample", "select * from update_sample"))
[1] a b
<0 rows> (or 0-length row.names)
In the last section, we explore joins. Let us import some data from Oracle into R using RODBC. The package RODBC needs to be installed and loaded as shown below:
> library(RODBC)
Error in library(RODBC) : there is no package called ‘RODBC’
> install.packages("RODBC")
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/RODBC_1.3-15.zip'
Content type 'application/zip' length 831635 bytes (812 KB)
downloaded 812 KB
package ‘RODBC’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Administrator\AppData\Local\Temp\RtmpU3bkd9\downloaded_packages
> library("RODBC", lib.loc="C:/Program Files/R/R-3.4.1/library")
Once above steps are carried out, we can establish a connection to Oracle as shown below:
> connection <- odbcConnect("Oracle", uid="hr", pwd="hr", rows_at_time = 1000)
The command is self-explanatory. Oracle is a DSN to facilitate this connection. Once the connection is established, we can start issuing queries as shown below:
> sqlQuery(connection,"select count(*) from employees")
COUNT(*)
1 107
We can fetch the data from Oracle as shown below where the data from employee table in Oracle is imported into employee_data.
> employee_data <- sqlQuery(connection,"select * from employees")
>
> class(employee_data)
[1] "data.frame"
>
> colnames(employee_data)
[1] "EMPLOYEE_ID" "FIRST_NAME" "LAST_NAME" "EMAIL" "PHONE_NUMBER"
[6] "HIRE_DATE" "JOB_ID" "SALARY" "COMMISSION_PCT" "MANAGER_ID"
[11] "DEPARTMENT_ID"
>
> str(employee_data)
'data.frame': 107 obs. of 11 variables:
$ EMPLOYEE_ID : int 100 101 102 103 104 105 106 107 108 109 ...
$ FIRST_NAME : Factor w/ 91 levels "Adam","Alana",..: 80 64 51 4 11 18 88 20 61 16 ...
$ LAST_NAME : Factor w/ 102 levels "Abel","Ande",..: 50 51 20 44 24 4 74 57 38 27 ...
$ EMAIL : Factor w/ 107 levels "ABANDA","ABULL",..: 90 74 61 6 11 17 104 23 73 19 ...
$ PHONE_NUMBER : Factor w/ 107 levels "011.44.1343.329268",..: 37 38 39 58 59 60 57 61 50 45 ...
$ HIRE_DATE : POSIXct, format: "2003-06-17" "2005-09-21" "2001-01-13" ...
$ JOB_ID : Factor w/ 19 levels "AC_ACCOUNT","AC_MGR",..: 4 5 5 9 9 9 9 9 7 6 ...
$ SALARY : int 24000 17000 17000 9000 6000 4800 4800 4200 12008 9000 ...
$ COMMISSION_PCT: num NA NA NA NA NA NA NA NA NA NA ...
$ MANAGER_ID : int NA 100 100 102 103 103 103 103 101 108 ...
$ DEPARTMENT_ID : int 90 90 90 60 60 60 60 60 100 100 ...
>
In the same manner, we import the department data as shown below:
> department_data <- sqlQuery(connection,"select * from departments")
> department_data
DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID
1 10 Administration 200 1700
2 20 Marketing 201 1800
3 30 Purchasing 114 1700
4 40 Human Resources 203 2400
5 50 Shipping 121 1500
6 60 IT 103 1400
7 70 Public Relations 204 2700
8 80 Sales 145 2500
9 90 Executive 100 1700
10 100 Finance 108 1700
11 110 Accounting 205 1700
12 120 Treasury NA 1700
13 130 Corporate Tax NA 1700
14 140 Control And Credit NA 1700
15 150 Shareholder Services NA 1700
16 160 Benefits NA 1700
17 170 Manufacturing NA 1700
18 180 Construction NA 1700
19 190 Contracting NA 1700
20 200 Operations NA 1700
21 210 IT Support NA 1700
22 220 NOC NA 1700
23 230 IT Helpdesk NA 1700
24 240 Government Sales NA 1700
25 250 Retail Sales NA 1700
26 260 Recruiting NA 1700
27 270 Payroll NA 1700
Now that we have the employee data and department data, we can perform a inner join operation in ANSI syntax as shown below (only the first five records are shown):
> sqldf("select employee_data.*,department_name from employee_data inner join department_data on employee_data.department_id = department_data.department_id limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 101 Neena Kochhar NKOCHHAR 515.123.4568 2005-09-21 AD_VP 17000
3 102 Lex De Haan LDEHAAN 515.123.4569 2001-01-13 AD_VP 17000
4 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
5 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 NA NA 90 Executive
2 NA 100 90 Executive
3 NA 100 90 Executive
4 NA 102 60 IT
5 NA 103 60 IT
The same inner join in non ANSI syntax is shown below:
> sqldf("select e1.*,d1.department_name from employee_data e1, department_data d1 where e1.department_id = d1.department_id limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 101 Neena Kochhar NKOCHHAR 515.123.4568 2005-09-21 AD_VP 17000
3 102 Lex De Haan LDEHAAN 515.123.4569 2001-01-13 AD_VP 17000
4 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
5 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 NA NA 90 Executive
2 NA 100 90 Executive
3 NA 100 90 Executive
4 NA 102 60 IT
5 NA 103 60 IT
Left outer join is shown below:
> sqldf("select employee_data.*,department_name from employee_data left join department_data on employee_data.department_id = department_data.department_id order by department_name asc limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 178 Kimberely Grant KGRANT 011.44.1644.429263 2007-05-24 SA_REP 7000
2 205 Shelley Higgins SHIGGINS 515.123.8080 2002-06-07 AC_MGR 12008
3 206 William Gietz WGIETZ 515.123.8181 2002-06-07 AC_ACCOUNT 8300
4 200 Jennifer Whalen JWHALEN 515.123.4444 2003-09-17 AD_ASST 4400
5 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 0.15 149 NA <NA>
2 NA 101 110 Accounting
3 NA 205 110 Accounting
4 NA 101 10 Administration
5 NA NA 90 Executive
Once all the queries against Oracle are complete, we can close the connection as shown below:
> close(connection)
Thus, we have seen how SQL can be used in R environment. More details about the available functions can be seen at:
https://www.sqlite.org/lang_corefunc.html
https://www.sqlite.org/lang_aggfunc.html
https://www.sqlite.org/lang_datefunc.html
> sqldf("select am,sum(wt) from mtcars group by am")
am sum(wt)
1 0 71.609
2 1 31.343
> sqldf("select carb,avg(wt) from mtcars group by carb")
carb avg(wt)
1 1 2.4900
2 2 2.8628
3 3 3.8600
4 4 3.8974
5 6 2.7700
6 8 3.5700
Group by clause usage is shown in the next query:
> sqldf("select carb,avg(wt) from mtcars group by carb having avg(wt) >= 3.0")
carb avg(wt)
1 3 3.8600
2 4 3.8974
3 8 3.5700
The next query shows a subquery:
> sqldf("select * from mtcars where disp = (select max(disp) from mtcars)")
mpg cyl disp hp drat wt qsec vs am gear carb
1 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
Set operations like union can be performed and an example is shown below:
> sqldf("select * from mtcars where disp = (select max(disp) from mtcars) union select * from mtcars where disp = (select min(disp) from mtcars)", row.names = TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
The last query gives the details of the vehicle having the maximum and minimum displacement in cubic inches.
The next query shows a correlated subquery:
> sqldf("select * from employee_data as e1 where salary > (select avg(salary) from employee_data where department_id = e1.department_id)")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
3 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
4 108 Nancy Greenberg NGREENBE 515.124.4569 2002-08-17 FI_MGR 12008
5 109 Daniel Faviet DFAVIET 515.124.4169 2002-08-16 FI_ACCOUNT 9000
6 114 Den Raphaely DRAPHEAL 515.127.4561 2002-12-07 PU_MAN 11000
7 120 Matthew Weiss MWEISS 650.123.1234 2004-07-18 ST_MAN 8000
8 121 Adam Fripp AFRIPP 650.123.2234 2005-04-10 ST_MAN 8200
9 122 Payam Kaufling PKAUFLIN 650.123.3234 2003-05-01 ST_MAN 7900
10 123 Shanta Vollman SVOLLMAN 650.123.4234 2005-10-10 ST_MAN 6500
11 124 Kevin Mourgos KMOURGOS 650.123.5234 2007-11-16 ST_MAN 5800
12 137 Renske Ladwig RLADWIG 650.121.1234 2003-07-14 ST_CLERK 3600
13 141 Trenna Rajs TRAJS 650.121.8009 2003-10-17 ST_CLERK 3500
14 145 John Russell JRUSSEL 011.44.1344.429268 2004-10-01 SA_MAN 14000
15 146 Karen Partners KPARTNER 011.44.1344.467268 2005-01-05 SA_MAN 13500
16 147 Alberto Errazuriz AERRAZUR 011.44.1344.429278 2005-03-10 SA_MAN 12000
17 148 Gerald Cambrault GCAMBRAU 011.44.1344.619268 2007-10-15 SA_MAN 11000
18 149 Eleni Zlotkey EZLOTKEY 011.44.1344.429018 2008-01-29 SA_MAN 10500
19 150 Peter Tucker PTUCKER 011.44.1344.129268 2005-01-30 SA_REP 10000
20 151 David Bernstein DBERNSTE 011.44.1344.345268 2005-03-24 SA_REP 9500
21 152 Peter Hall PHALL 011.44.1344.478968 2005-08-20 SA_REP 9000
22 156 Janette King JKING 011.44.1345.429268 2004-01-30 SA_REP 10000
23 157 Patrick Sully PSULLY 011.44.1345.929268 2004-03-04 SA_REP 9500
24 158 Allan McEwen AMCEWEN 011.44.1345.829268 2004-08-01 SA_REP 9000
25 162 Clara Vishney CVISHNEY 011.44.1346.129268 2005-11-11 SA_REP 10500
26 163 Danielle Greene DGREENE 011.44.1346.229268 2007-03-19 SA_REP 9500
27 168 Lisa Ozer LOZER 011.44.1343.929268 2005-03-11 SA_REP 11500
28 169 Harrison Bloom HBLOOM 011.44.1343.829268 2006-03-23 SA_REP 10000
29 170 Tayler Fox TFOX 011.44.1343.729268 2006-01-24 SA_REP 9600
30 174 Ellen Abel EABEL 011.44.1644.429267 2004-05-11 SA_REP 11000
31 184 Nandita Sarchand NSARCHAN 650.509.1876 2004-01-27 SH_CLERK 4200
32 185 Alexis Bull ABULL 650.509.2876 2005-02-20 SH_CLERK 4100
33 188 Kelly Chung KCHUNG 650.505.1876 2005-06-14 SH_CLERK 3800
34 189 Jennifer Dilly JDILLY 650.505.2876 2005-08-13 SH_CLERK 3600
35 192 Sarah Bell SBELL 650.501.1876 2004-02-04 SH_CLERK 4000
36 193 Britney Everett BEVERETT 650.501.2876 2005-03-03 SH_CLERK 3900
37 201 Michael Hartstein MHARTSTE 515.123.5555 2004-02-17 MK_MAN 13000
38 205 Shelley Higgins SHIGGINS 515.123.8080 2002-06-07 AC_MGR 12008
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
1 NA NA 90
2 NA 102 60
3 NA 103 60
4 NA 101 100
5 NA 108 100
6 NA 100 30
7 NA 100 50
8 NA 100 50
9 NA 100 50
10 NA 100 50
11 NA 100 50
12 NA 123 50
13 NA 124 50
14 0.40 100 80
15 0.30 100 80
16 0.30 100 80
17 0.30 100 80
18 0.20 100 80
19 0.30 145 80
20 0.25 145 80
21 0.25 145 80
22 0.35 146 80
23 0.35 146 80
24 0.35 146 80
25 0.25 147 80
26 0.15 147 80
27 0.25 148 80
28 0.20 148 80
29 0.20 148 80
30 0.30 149 80
31 NA 121 50
32 NA 121 50
33 NA 122 50
34 NA 122 50
35 NA 123 50
36 NA 123 50
37 NA 100 20
38 NA 101 110
The above query returns those employees that have salaries above the average salaries in their respective departments. In the next query we see how we can create a temporary table and use that temporary table for querying.
> sqldf(c("create temp table temp_1 as select * from mtcars where qsec > 17.0", "select gear,avg(disp) from temp_1 group by gear"))
gear avg(disp)
1 3 322.2083
2 4 119.6545
In the above query, a temporary table called temp_1 that only exists for querying in this command . The next query uses temp_1 to calculate the average of displacement by gear.
To illustrate update feature, we a dataframe called update_sample and update the missing values in second column from the values in first column.
> update_sample <- data.frame(a = 1:5, b = c(3, NA, 5, NA, 7))
> update_sample
a b
1 1 3
2 2 NA
3 3 5
4 4 NA
5 5 7
> sqldf(c("update update_sample set b = a where b is null", "select * from update_sample"))
a b
1 1 3
2 2 2
3 3 5
4 4 4
5 5 7
It is to be remembered that update_sample is unchanged and the updated record set is only to be used in the second query as shown above. This behavior is the same for the insert and delete statements. The insert as shown below inserts a new row with values 15 and NA:
> sqldf(c("insert into update_sample (a,b) values (15,NULL)", "select * from update_sample"))
a b
1 1 3
2 2 NA
3 3 5
4 4 NA
5 5 7
6 15 NA
The delete keyword deletes a row that has a equals 4 as shown below:
update_sample <- data.frame(a = 1:5, b = c(3, NA, 5, NA, 7))
> update_sample
a b
1 1 3
2 2 NA
3 3 5
4 4 NA
5 5 7
> sqldf(c("delete from update_sample where a = 4",
+ "select * from update_sample"))
a b
1 1 3
2 2 NA
3 3 5
4 5 7
To delete all the records in the table, we can use the command shown below:
> sqldf(c("delete from update_sample", "select * from update_sample"))
[1] a b
<0 rows> (or 0-length row.names)
In the last section, we explore joins. Let us import some data from Oracle into R using RODBC. The package RODBC needs to be installed and loaded as shown below:
Error in library(RODBC) : there is no package called ‘RODBC’
> install.packages("RODBC")
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/RODBC_1.3-15.zip'
Content type 'application/zip' length 831635 bytes (812 KB)
downloaded 812 KB
package ‘RODBC’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Administrator\AppData\Local\Temp\RtmpU3bkd9\downloaded_packages
> library("RODBC", lib.loc="C:/Program Files/R/R-3.4.1/library")
Once above steps are carried out, we can establish a connection to Oracle as shown below:
> connection <- odbcConnect("Oracle", uid="hr", pwd="hr", rows_at_time = 1000)
The command is self-explanatory. Oracle is a DSN to facilitate this connection. Once the connection is established, we can start issuing queries as shown below:
> sqlQuery(connection,"select count(*) from employees")
COUNT(*)
1 107
We can fetch the data from Oracle as shown below where the data from employee table in Oracle is imported into employee_data.
> employee_data <- sqlQuery(connection,"select * from employees")
>
> class(employee_data)
[1] "data.frame"
>
> colnames(employee_data)
[1] "EMPLOYEE_ID" "FIRST_NAME" "LAST_NAME" "EMAIL" "PHONE_NUMBER"
[6] "HIRE_DATE" "JOB_ID" "SALARY" "COMMISSION_PCT" "MANAGER_ID"
[11] "DEPARTMENT_ID"
>
> str(employee_data)
'data.frame': 107 obs. of 11 variables:
$ EMPLOYEE_ID : int 100 101 102 103 104 105 106 107 108 109 ...
$ FIRST_NAME : Factor w/ 91 levels "Adam","Alana",..: 80 64 51 4 11 18 88 20 61 16 ...
$ LAST_NAME : Factor w/ 102 levels "Abel","Ande",..: 50 51 20 44 24 4 74 57 38 27 ...
$ EMAIL : Factor w/ 107 levels "ABANDA","ABULL",..: 90 74 61 6 11 17 104 23 73 19 ...
$ PHONE_NUMBER : Factor w/ 107 levels "011.44.1343.329268",..: 37 38 39 58 59 60 57 61 50 45 ...
$ HIRE_DATE : POSIXct, format: "2003-06-17" "2005-09-21" "2001-01-13" ...
$ JOB_ID : Factor w/ 19 levels "AC_ACCOUNT","AC_MGR",..: 4 5 5 9 9 9 9 9 7 6 ...
$ SALARY : int 24000 17000 17000 9000 6000 4800 4800 4200 12008 9000 ...
$ COMMISSION_PCT: num NA NA NA NA NA NA NA NA NA NA ...
$ MANAGER_ID : int NA 100 100 102 103 103 103 103 101 108 ...
$ DEPARTMENT_ID : int 90 90 90 60 60 60 60 60 100 100 ...
>
In the same manner, we import the department data as shown below:
> department_data <- sqlQuery(connection,"select * from departments")
> department_data
DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID
1 10 Administration 200 1700
2 20 Marketing 201 1800
3 30 Purchasing 114 1700
4 40 Human Resources 203 2400
5 50 Shipping 121 1500
6 60 IT 103 1400
7 70 Public Relations 204 2700
8 80 Sales 145 2500
9 90 Executive 100 1700
10 100 Finance 108 1700
11 110 Accounting 205 1700
12 120 Treasury NA 1700
13 130 Corporate Tax NA 1700
14 140 Control And Credit NA 1700
15 150 Shareholder Services NA 1700
16 160 Benefits NA 1700
17 170 Manufacturing NA 1700
18 180 Construction NA 1700
19 190 Contracting NA 1700
20 200 Operations NA 1700
21 210 IT Support NA 1700
22 220 NOC NA 1700
23 230 IT Helpdesk NA 1700
24 240 Government Sales NA 1700
25 250 Retail Sales NA 1700
26 260 Recruiting NA 1700
27 270 Payroll NA 1700
Now that we have the employee data and department data, we can perform a inner join operation in ANSI syntax as shown below (only the first five records are shown):
> sqldf("select employee_data.*,department_name from employee_data inner join department_data on employee_data.department_id = department_data.department_id limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 101 Neena Kochhar NKOCHHAR 515.123.4568 2005-09-21 AD_VP 17000
3 102 Lex De Haan LDEHAAN 515.123.4569 2001-01-13 AD_VP 17000
4 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
5 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 NA NA 90 Executive
2 NA 100 90 Executive
3 NA 100 90 Executive
4 NA 102 60 IT
5 NA 103 60 IT
The same inner join in non ANSI syntax is shown below:
> sqldf("select e1.*,d1.department_name from employee_data e1, department_data d1 where e1.department_id = d1.department_id limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
2 101 Neena Kochhar NKOCHHAR 515.123.4568 2005-09-21 AD_VP 17000
3 102 Lex De Haan LDEHAAN 515.123.4569 2001-01-13 AD_VP 17000
4 103 Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG 9000
5 104 Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG 6000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 NA NA 90 Executive
2 NA 100 90 Executive
3 NA 100 90 Executive
4 NA 102 60 IT
5 NA 103 60 IT
Left outer join is shown below:
> sqldf("select employee_data.*,department_name from employee_data left join department_data on employee_data.department_id = department_data.department_id order by department_name asc limit 5")
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY
1 178 Kimberely Grant KGRANT 011.44.1644.429263 2007-05-24 SA_REP 7000
2 205 Shelley Higgins SHIGGINS 515.123.8080 2002-06-07 AC_MGR 12008
3 206 William Gietz WGIETZ 515.123.8181 2002-06-07 AC_ACCOUNT 8300
4 200 Jennifer Whalen JWHALEN 515.123.4444 2003-09-17 AD_ASST 4400
5 100 Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME
1 0.15 149 NA <NA>
2 NA 101 110 Accounting
3 NA 205 110 Accounting
4 NA 101 10 Administration
5 NA NA 90 Executive
Once all the queries against Oracle are complete, we can close the connection as shown below:
> close(connection)
Thus, we have seen how SQL can be used in R environment. More details about the available functions can be seen at:
https://www.sqlite.org/lang_corefunc.html
https://www.sqlite.org/lang_aggfunc.html
https://www.sqlite.org/lang_datefunc.html