2020 is over, and brighter days are coming. On the last day of 2020, I want to reflect on the past year and setup some goals for the new 2021. The COVID-19 changed our lives and caused many losses, but I was, lucky enough, able to get on some new habits, chanllenges, and achievements in 2020.

Picture comes from ( https://unsplash.com/@macroman)

Reflections on 2020

  • Enjoyed staying more with family: I have been working from home since the late of March. At the very beginning, I worried my two screaming little girls would ruin my work at home. It turned out that I was…

Microsoft Power BI Desktop can connect to many data sources, Oracle Database is one of the most popular ones. The missed steps on installation of Oracle client and ambiguous specification of ServerName/ServiceName in the Power BI’s official document make it hard to follow. This post aims to fill the gap with step-by-step instrucitons to figure out LDAP information for connecting to an Oracle Database, to install and configure an Oracle Data Access Client (ODAC), and to connect to an Oracle Database in Power BI Desktop.

Picture comes from ( https://unsplash.com/@goumbik)

Lightweight Directory Application Protocol (LDAP) Connection in SQL Developer

There are serveral software clients can be used to connect…


In previous post I described how to create a Linux Virtual Machine(VM) and connect to it through WindSCP on Windows 10 under a company proxy. However, it is not a best practice from security perspective. One of the best practices is to create a VM with private IP that comes from a dedicated Virtual Network (VNet). The VNet can be in the same Resource Group as the VM, but it can also be from different Resource Groups and even different Subscriptions. …


Photo by Glenn Carstens-Peters on Unsplash

To learn Microsoft Azure Virtual Machine, the first thing you want to try might be to create a VM and connect to it. If you follow the steps described in Microsoft’s official quickstart document, you most likely fail to connect to it, especially when you are under a company proxy. This post decribes the step-by-step process I performed to create a Linux VM and connect to it through WindSCP on Windows 10 under a company proxy.

Install Azure ClI

You can create a VM easily with the Azure Portal, but the Azure CLI would make the process even easier. Installing Azure CLI on…


Photo by Benjamin Dada on Unsplash

AI Platform Notebooks is a managed service that provides an integration of JupyterLab, Git, and optimized data science libraries and frameworks. From it, you can get access to almost all ML frameworks, such as TensorFlow, Keras, PyTorch, etc. It is an excellent place for machine learning developers to experiment, develop, and deploy models into production. In addition, AI Platform Notebooks also supports enterprise security architectures through shared VPC and private IP controls.

What is Shared VPC?

Shared Virtual Private Network (VPC) allows organizations to connect resources from multiple projects to a common/shared VPC network. This allows cloud resources to communicate with each other securely…


As a power user of Google Cloud Platform, you definately need to use gcloud, gsutil and bq commands to work with GCP, which means you need to install Google Cloud SDK on your local computer. You can install the Cloud SDK through many options, including versioned archives, installer, apt-get/yum for Linux distro, and even Docker image. This post describes the process of installing the Cloud SDK through versioned archive on operating systems that have already installed Python through Anaconda. The process has been tested on both Windows 10 and Ubuntu 18.04.

Installing through versioned archives might be the best way…


Kubeflow is an open-source project which aims to make running ML workloads on Kubernetes simple, portable and scalable. However, setting up a Kubeflow cluster in a shared VPC on Google Cloud Platform can not be done through the web console yet. This post tries to describe the steps you need to follow to set up a Kubeflow using a Shared VPC through command line.

Prepare the Environment

Step 1: Install Google SDK. If you use Cloud Shell, enable the boost mode. The following steps have been tested in Cloud Shell.

Step 2: Run the following command in the service project to check if…


Google Cloud Dataflow is a fully managed platform running Apache Beam for unified stream and batch data processing services. With its fully managed approach on resource provisioning and horizontal autoscaling, you have access to virtually unlimited capacity and optimized price-to-performance to solve your data pipeline processing challenges.

Shared Virtual Private Network (VPC) allows an organization to connect resources from multiple projects to a common VPC network. This allows cloud resources to communicate with each other securely and efficiently using internal IP adresses from that network. When you use shared VPC, you designates a project as a Host Project and attach…


Hortonworks Logo from https://images.app.goo.gl/3YDkHkEiqUEUEA8a7

This post describes the process to install Hortontworks HDP 3.1.0 on a cluster of three VMWare virtual machines. The process includes four major steps: 1) set up the cluster environemnt; 2) set up a local repository for both Ambari and HDP stacks; 3) Install Ambari server and agent; 4) install, configure and deploy the cluster.

This installation process might work for other versions too. Please check the product versions through Hortonworks support matrix: https://supportmatrix.hortonworks.com/

Virtual Nodes Information

Three virtual machines in VMWare with following settings. RedHat Enterprise 7.6 has been installed on each node.

Lei Feng

Big Data, Google Cloud Platform, Machine Learning, Operations Research

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store