MS Informatics e-Portfolio
Competency F
Conduct informatics analysis and visualization applied to different real-world fields, such as health science and sports
Introduction
The terms data analysis and visualization are often used in data science circles or to describe the capabilities of new business platforms and processes that aim to solve organizational problems by implementing data-driven processes. Though analysis and visualization are brought up frequently in conversation, I believe that it’s important to understand what these terms really mean. Data analytics, as defined by Oracle (n.d.), is a process which transforms organizational data into valuable insights from a variety of data sources. Historically businesses gathered data from structured databases; businesses now can gather data from both their structured databases, as well as from data sources which are both semi structured or unstructured (i.e. data collected from unstructured data sources may include collecting data from formats like documents, videos, images, etc.). There are a couple of main types of data analytics which focuses on different goals: predictive data analytics (e.g. insights aimed at predicting what will happen with current scenarios), prescriptive data analytics (e.g. insights directed at optimizing future strategies), diagnostic data analytics (insights used to understand what happened with current data and identify the cause of a problem), and descriptive analytics (insights are primarily used for business intelligence). (Oracle, n.d.).
Data visualization is under the umbrella of data analytics as its focus is to bridge the gap between human and machine by producing a way to illustrate insights from data in ways we can understand. IBM Cloud Education (2021) defined data visualization as “the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand” (para. 1). Some of the common types of data visualizations may include tables, pie charts and stacked bar charts, line charts and area charts, histograms, scatter plots, heat maps, and tree maps. (IBM Cloud Education, 2021). Visualizations are a great tool for taking a myriad amount of complex data and putting it into an understandable format that we can comprehend, however Godsey (2017) warned that they need to be designed and tailored well to their target application: “visualizations of data and results can be helpful in reports and applications, but if they’re not designed well, they can be detrimental to the product's intent” (p. 253).
Keeping these definitions in mind, Competency F aims at applying both analysis and visualization to the informatics profession to gain valuable insights to further our knowledge in real-world fields. Analysis and visualization benefit many scientific fields, like health science, because much of the focus is placed on answering questions and testing hypotheses with tools like models and simulation. Industries like technology and retail can also benefit from analysis and visualization from applications like market research, evaluation of internal operations, etc.
Discussion
The value of data analysis and visualization has been discussed in several of my informatics classes. In classes like INFM 206, I was exposed to the value of governing and standardizing data which leads to higher accuracy when the data is eventually processed, thus leading to more accurate insights that better reflect what is truly going on. INFM 207 introduced me to the value of metadata and how it can be applied to get a better understanding around the context of the data (i.e. where did it come from, what is it for, who made it, etc.). In classes such as INFM 202, INFM 208, and INFM 216, I learned about scenarios where bad actors access an organization’s data and insights for purposes like espionage and personal gain; additionally, I learned how to best defend against these threats. Though I was able to touch on data analysis and visualization in several of these classes, I would say that INFM 203 and INFM 210 gave me the most direct opportunities to learn and demonstrate my understanding of these concepts in relation to Competency F.
Evidence
In this section, I submit two assignments that demonstrate Competency F which include my INFM 203 mini-project on a canned craft beer dataset and my INFM 210 project on implementing a healthcare analytics program at a medium size hospital. Both of these projects focus around analytics and visualizations and how these can be applied to real-world fields. The INFM 203 and INFM 210 projects differ in their end-goals: the INFM 203 project is about my interactions with a particular dataset and how I intend to answer my questions using analytical methods, whereas the INFM 210 project centers around an implementation plan to introduce analytics into the daily operations of a hospital.
In the first project, the INFM 203 mini-project on a canned craft beer dataset, I presented on my interactions and impressions of a CSV dataset, what questions I wanted to ask about the data, explained some steps I took to conduct some exploratory analysis on the dataset, documented my future plans to conduct deeper analysis, and discussed what lessons I took away from the project. The dataset that I explored in this project contained data about different kinds of canned craft beers and their attributes: some of the attributes this dataset featured included alcohol content (abv or Alcohol by Volume), hop content (ibu or International Bitterness Units), beer name, style of beer, brewery id (linked to a second dataset which included the name of the brewery, city, and state brewery is in), and volume of beer in the can measured in ounces. In my presentation, I included sections that discussed the nature of the dataset such as where the dataset came from, how the dataset was originally created by the dataset author, and what the structure of the dataset looks like.
After explaining what the dataset was about and how it was organized, I then went on to present my own research. In my research I examined what insights I hoped to gain from the dataset by answering the right questions with the given data (i.e. is there a relationship between the style of a beer and is alcohol content in abv). To answer these questions, I described my processes with light, exploratory analysis and how I intend to conduct future heavier analysis. At the end of the presentation, I described how the mini-project challenged me to think in new ways and how it served as a learning experience by mentioning the skills and insights I gained, as well as what gaps I need to fill in my knowledge in the future. I demonstrated Competency F, in this first piece of evidence, because I was able to assume the role of a data scientist and identify and critically evaluate a dataset relating to a real-world industry (e.g. the brewing industry). I was also given the opportunity to conduct some preliminary data analysis and visualization and construct a future plan to carry out more dataset exploration.
In the second project, the INFM 210 project on implementing a healthcare analytics program at a medium size hospital, I presented on a hypothetical analytics implementation plan aimed at improving daily operations at Stanford Healthcare ValleyCare hospital, with the assumed audience being upper management. In the first part of this presentation, I explained what role analytics plays in the healthcare industry and built a case for having an analytics program at ValleyCare. After my argument on behalf of having a healthcare analytics program, I laid out the groundwork needed for the implementation plan. The implementation plan consisted of an iterative four-step lifecycle which is comprised of the following stages: assess, plan, execute, and check.
In this four-step cycle, I highlighted that ValleyCare must first begin with the assess step by asking itself what goals it wishes to accomplish from the program and gauge whether it is ready to commit itself to designing a solution. In the next step, plan, the hospital must work together at an organization-wide scale to decide what data it plans to use and what tools and processes will be required for analysis of that data. Following planning, in the execute step, ValleyCare needs to use the tools and processes it has decided on to gain insights. In the last step, check, the hospital evaluates the effectiveness of the analytics program by examining its strengths and weaknesses. Identifying some of the areas of improvement in the check stage is valuable since this feedback can be used to improve current processes, when it is time to restart the lifecycle again with the assess stage. This second piece of evidence demonstrates Competency F, because it examined the role that data analytics and visualization plays in the field of healthcare. Additionally, in this piece of evidence, I suggest a four-step analytics implementation plan which is inspired by the goals and outcomes of the iterative design methodology, which focuses on gradual improvements and optimization.
INFM 203 Mini-Project
INFM 210 Healthcare Analytics Project
Conclusion
Both of these pieces of evidence presented me with an opportunity to explore analytics and visualization and put what I’ve learned in my previous informatics classes into practice in the context of two real-world industries (e.g. the healthcare and brewery industry). In the INFM 203 mini-project, I learned how to approach analytics and visualization like a data scientist. Here I was given a chance to explore the dataset, think about what questions I can ask about the data given what is provided and with my knowledge about the subject, and answer my own questions through experimentation via analytical means. The mini-project also gave me a chance to assess my own strengths and weaknesses: this skill is valuable for my future career since it is important to assess my own knowledge gaps for personal growth. In the INFM 210 project on implementing a healthcare analytics program at ValleyCare, I had a chance to practice my presentation skills. For my presentation to be effective, I needed to succinctly communicate, to my target audience, my reasoning behind having an analytics program at ValleyCare and how it would need to be implemented for the highest chance of a successful outcome.
References
Godsey, B. (2017). Think like a data scientist: Tackle the data science process step-by-step.
Manning.
IBM Cloud Education. (2021, February 10). Data visualization.
https://www.ibm.com/cloud/learn/data-visualization
Oracle. (n.d.). What is data analytics?.