Monday, 6 May 2019

Python and the Data (ADW)

Let’s get connected: Python and ADW
In this blog, I’m taking more of a Data Scientist’s / Developers view point. Python is the fastest growing language [1]. In this blog I’m going to demonstrate how you can batch load data from a Python environment to ADW. The assumption is that there is already a table within our database instance and we are simply inserting data into that table. 
Overall there are 4 stages to this:
1.    Credentials for the ADW
2.    Installing Oracle Instant Client
3.    Setting up your Python environment
4.    Loading the data

Stage 1: Credentials for the ADW
The details screen allows you to perform some basic operations with the service, including scale up/down etc. Click on the "Service Console" button.
The administration screen allows you to perform some basic administration of the service, one of which is downloading the Credentials, click the "Download Client Credentials".
When you download the wallet, you will be asked to enter a password to protect the credential store. 
Once you have downloaded the wallet you will need to un-zip the wallet, you should end up with 2 folders: 


Note: Windows 10 security, when you unzip the folder be sure that all users have full access. Right click on the folder -> Properties -> Security.

Within the unzipped wallet, Update the sqlnet.ORA file (Right click open with Notepad or other text editor). You will need to update the DIRECTORY path to the place that you have unzipped to.
Now open the tnsnames.ora, make a note of the database connection string you want to use. In this example we will be using the mldemo_high.

Stage 2: Installing Oracle Instant Client
This is a freely available program here, which provides the libraries and SDKs for building and connections to an Oracle Database instance. I won’t go into too much detail here, as there are good instructions in the link on how to install the software. The only note of caution I would add is to ensure that when the Path variable is added to your system, make sure it’s at the top of the list.
Stage 3: Prepare the Python Environment 
Package requirement: cx_Oracle
To interact with Oracle Database, cx_Oracle uses Oracle Client libraries behind the scenes. The cx_Oracle module conforms to the Python DB API specification with many additions to support Oracle Database advanced features. Each release of cx_Oracle is tested against the latest releases of Oracle Instant Client and Oracle Database.

In this example I’m using the Jupyter Notebook feature within Anaconda. I have created a new environment to download the necessary packages. It is highly advised that you create environments for packages and associate your notebooks to those environments. 
Within your environment: 
pip install cx_Oracle

Stage 4: Loading the data
Import the required packages for the demo:
Python and ADW: The hand shake!
As we did all of our set up work, then the hand shake between Python and ADW is really down to 5 lines.
Line 1: Set you TNS admin – where you put your wallet.
Line 2: Add your connection details, so this is your database username, password and connection type from the tnsnames.ora
Line 3: Connect to the database
Line 4: Check the connection by performing a query against dual
Line 5: Fetch the result from the database

Bring out the data!
In this example I’m loading the data into a Pandas dataframe, this is mainly as I do a little pre-processing before loading the data. If you do have the data in a Pandas dataframe, you’ll need to convert it to a list to enable it to bulk load.
Now we prepare the insert statement. The key variables are:
·      Which table are you inserting into
·      The column names
·      Values are the indexes of the dictionary in the list (created in the previous step)
Cusor.executemany is then used to load the list into the database. There is another version of this cursor.execute, which will only read one line at a time. 
Once completed don’t forget to commit the insert to the database:

Once this is done, the data will now be in the database. To give readers an idea how efficient this is/isn’t, then I was able to load ~9 million rows of data (10 columns), in approximately 7 minutes. Which isn’t too bad. For more information check out Oracles documentation here.


[1]. https://www.techrepublic.com/article/which-is-the-fastest-growing-programming-language-hint-its-not-javascript/

No comments:

Post a Comment