The business world experiences a growing demand for near real-time insights into the various aspects of business performance. This means that traditional BI solutions, along with their long deployment cycles, heavy capital expenditures, and the constant reliance on IT to generate reports are rapidly giving way to cloud-hosted solutions. The latter offer businesses faster insights delivery at a much lower total cost of ownership. For most IT executives, the question now is not if but rather when and how to transition at least some aspects of their business intelligence infrastructures to the cloud.
But with all the myriad technology options out there, how do you start, implement, and operationalise a transition to the cloud? What resources would you need? What hardware/software? How much time will it take? How do you come up with quick, high-level, comparative estimates for capital and operating expenses for both cloud-based and on-premises solutions?
In this article, custom application developers from Itransition present a tool-agnostic framework to help IT managers quickly address all the questions above. Using such a framework, managers can easily translate a strategic vision around cloud BI into tactical implementation projects, each with a clear agenda, milestones, delivery methods, cost/resource requirements, and dependencies. Notice that this framework is entirely conceptual and can be applied to any cloud platform or software with negligible variation.
How can organisations build a multi-cloud platform?
Conceptual architecture overview — a layered approach
Separation of concerns has been an age-old principle in software architecture. The layered approach we introduce below is inspired by the same principles and attempts to divide the entire business intelligence delivery process into fully decoupled, logically independent layers that can be architected, designed, and developed almost independently. These layers include:
1. Physical hosting layer
Considerations in planning this layer would largely boil down to selecting the right cloud platform. There are Microsoft Azure, Amazon Cloud, Google App Engine, or other smaller players, such as VMWare, on the market. The right choice would depend on your budget, the need for connecting to other systems within your premises, and the kind of infrastructure you have already deployed (for example, Microsoft shops would typically strongly favor Azure cloud).
A specific consideration here would be to check the availability of key BI tools on an on-demand pricing model with each platform. For example, Amazon cloud provides pre-built images for tools such as Informatica and Tableau, and this should be a strong consideration if you plan to use these tools as part of your BI technology stack.
From a conceptual planning perspective, architecting the physical hosting layer would imply:
- Selecting the cloud platform based on due diligence on some of the points above.
- Deciding on what specific software you would need to run and how it would be interconnected.
- Coming up with tentative estimates for operating costs based on indicative hardware pricing (these would be refined as you complete the architecture of the other layers).
When less isn’t more: Why businesses are embracing the multi-cloud approach
2. Data storage layer
Planning the data storage layer would involve selecting how the data will be stored in the physical layer. Some considerations include:
- How should raw data extracted from peripheral systems be stored? Do we use a data warehouse? A data mart? A combination of the two?
- Might we be better off using a big data store (e.g., Hive) and then running ETL scripts to feed processing-ready data into a data warehouse?
- Is there a case for using No-SQL databases?
- Defining the high-level flow of data from peripheral systems (e.g., CRM, web analytics, billing, or marketing automation) into the staging area where raw data is typically aggregated.
- Defining the structure of reporting databases, which will contain data that is modelled for specific business queries.
- Which specific product would we use for each of the logical components (e.g., Oracle/MySQL/Amazon RDS for RDBMS, Oracle/Teradata/Redshift etc. for data warehouse, MongoDB etc. for any No-SQL storage requirements)?
- Refining the cost estimates, skills requirements, and process dependencies obtained as a first cut from the architecture of the physical layer.
While not comprehensive by any means, the above list of activities should provide managers with a solid overview of how data will be stored along with tentative estimates of costs, skills requirements, deployment timelines, and any process dependencies.
Multi-cloud strategies: Why are organisations lagging behind?
3. ETL Layer
The ETL (or ELT, depending on how you process data) layer involves extracting data from various peripheral tools that contain the data to be analyzed (e.g., Salesforce CRM, Adobe Analytics, Tealium, SAP, Shopify, Eloqua etc.). Typical considerations in architecting this layer would include:
- Which tool do we use for ETL? (e.g., Informatica, Talend, Pentaho etc.)
- A high-level overview of the various scripts that would fetch data into the staging area and then feed them into the reporting silo.
- How to handle offline data access (e.g., storing long-lived tokens, basic authentication etc.). How do we handle APIs that do not provide offline data access? Can we use FTP downloads? Scheduled emails?
- Would the architecture benefit from using canonical data models? For example, if the target data model is fixed but source data formats vary widely, then using a canonical data model will provide long-term cost savings and a faster time-to-delivery.
The considerations above can be scaled up and down depending on the accuracy of estimates and resource projections required at the planning stage.
4. Visualisation layer
Data is extracted, stored securely, and is now available in a reporting-ready format. Now is the time to decide how to convert it into information. Architecting the visualization layer boils down to the choice of a tool to be used for reporting and dashboards. Some considerations would include:
- Do we use a build or buy approach? If the number of users is large, and/or reports to be generated are largely static (not much interaction or drill-down capabilities needed), it might be worth considering the use of charting libraries, such as FusionCharts, ChartJs, and D3.Js, to build your own visualization layer. If, on the other hand, the reporting requirements are highly interactive, then investing in a commercial BI tool would be more appropriate.
- If you decided to use a commercial tool, which one would it be? Examples to consider include PowerBI, Tableau, Qlikview, Alteryx, and many others. Apart from budget, the considerations here would include the availability of adaptors to third-party apps, the capability of REST API services they provide, quality of documentation/support, and skills availability.
Avoiding vendor lock-in with a multi-cloud strategy
Bringing it all together in the Cloud
Notice that the conceptual layers outlined above will work equally well when implementing an on-premises BI solution. For example, you could simply buy a Tableau Server license and use it to install it on an on-premises instance. However, using a cloud-based image of Tableau Server that could be instantly launched on the hardware of your selection, and where costs are driven by a pay-per-use model, may present significant long-term cost advantages. Using the framework above, managers can not only plan detailed implementations but also come up with accurate estimates of project costs/timelines. And these may help in making early-stage decisions of whether to migrate to a cloud or keep things on premises.
Written by Darya Nehaychik, technology observer at Itransition.