The CloudOps dashboard offers increased cloud observability across different clouds. It offers a bird’s eye view of operations and brings to notice important issues which can have an impact on smooth cloud operations.
This dashboard is divided into Four sections: Tenant Based Summary, Information, Insights and Inference.
Tenant Based Summary
If there are multiple tenants under an Account Admin, they can view the focused CloudOps dashboard details based on the Tenant selected.
This section showcases the current state of cloud infrastructure.
Every activity performed in the last 24 hours on the cloud, be it through CoreStack or directly using the AWS or Azure portal is captured for immediate and future reference. For example, provisioning a VM, updating a resource, etc are all marked as activities. This helps monitor the activities performed in cloud. An increase in the activity log number is a cause of concern. This section shows activities for last 24 hours.
Alerts that indicate that a metric is trespassing its preset threshold limit. An alert metric could be a spike in CPU usage or VM downtime.
Click on an alert for a more detailed view, analysis and methods of resolution.
Click the graph icon to view the utilization trend of a specific metric for which alert is received.
Metric utilization trends can be viewed for a day or over a week or over a month. Also, CoreStack’s machine learning capabilities forecasts the utilization trend for the next 15 days. This helps you to plan and take informed decisions.
The Utilization Trend is split into three sections: Observation, Prediction and Prescribe.
This section showcases the deviations in the metric. There is a comparison of the average threshold of a given metric versus the recorded deviation. This list shows the top three deviations noted by the system.
This section determines the variation that the utilization of the metric will display in the next 24 hours and the next 7 days.
In this section, you can rewrite the threshold condition, based on the usage. Depending on the average trend, the buffer value can be increased by say 20%, thereby changing the threshold limit. To change the threshold limit:
1. View the weekly average value, highlighted in the blue box. This is the base value on top of which the buffer limit is set. In this example it is 91.30
2. Next, use the drop-down menu to set the buffer value. Let’s choose 20%
3. As is seen, the recommended value is now 109.56, as provided in the red box. Click apply to set the new threshold limit.
Resolving a Threshold Alert
Resolve threshold alerts, using the Resolve button provided next to each of the alerts.
This opens a pop-up box with further options on resolution. There is a confidence level assigned to each resolution option. CoreStack’s machine learning capabilities will internalize the decisions made each time and will increase the confidence levels over a period of time.
There are three resolution actions available for Virtual Machines. The resolution actions will vary based on the resource types for which monitoring threshold is detected.Confidence level is calculated based on the previous actions performed by the users:
- Stop the virtual machine
- Start the virtual machine
- Resize the virtual machine
You can choose either of the actions and apply from here to resolve the alert.
CoreStack employs automation features such as Templates and Scripts to make operations streamlined and automatic. At times there are automation issues such as technical faults which stop these Templates and Scripts from getting executed. These failures appear in the Automation Failures list.
Click View to drill down to view details of each of the failures and attempt resolution.
As can be seen, the status of the automation job is displayed as Create_Failed. To rerun the job, navigate to the Action column and select rerun from the drop-down menu.
Person Hours Saved
CoreStack’s automation capabilities helps organizations cut back on the person hours otherwise spent on manual management and governance of cloud environments. This counter showcases the number of person hours saved by CoreStack.
This section displays the state of the cloud environment over the past few days. The number of activity logs and threshold alerts determine the noisiness of cloud accounts and displays those accounts that face problems frequently.
Last 30 days trends
This line graph displays the number of activity logs, automation failures and threshold alerts everyday over a 30 day period. This helps to monitor spikes and identify reasons for those.
Top 5 Noisy Accounts (Across Cloud Account)
This pie chart showcases the top 5 cloud accounts that have high number of threshold alerts and activity logs.
Top 5 Noisy Resources (Across Cloud Account)
This pie chart showcases the top 5 resources such as virtual machines/servers that have high number of threshold alerts and activity logs.
Inference (Threshold Alert Prediction)
This section displays the forecasts in the increase/decrease of threshold alerts. The alerts are shown for the next 1 day, 2 days, and the next 15 days.