Global Journal of Computer Science and Technology, C: Software & Data Engineering, Volume 22 Issue 2

Integration of the Big Data Environment in a Financial Sector Entity to Optimize Products, Services and Decision-Making Ulises Roman Concha α , José Huapaya Vásquez σ , Guillermo Morales Romero ρ & Dominga Cano Ccoa Ѡ Abstract- This article describes the integration from big data environment in the management of products and services from a banking entity with optimizing financial products and decision-making. Actually, there are many financial entities where their different business areas have isolated databases, causing greater consumption of computer resources, maintainability and, in many cases, process delays. This problem becomes critical specially if there is a transnational company because data needs can vary geographically despite being the same functional area. The Data Architecture area proposed guidelines such as centralizing information in a big data environment, ensuring progressive accessibility from users for new financial analytics initiatives and thereby reducing isolated data. The agile, Scrum framework supported the advanced analytics pilot which comprising developments in the data ingestion layer (data lake) through the distributed processing from Apache Spark; and information consumption through Sandboxes, which one, users performing the analysis, visualization and prediction from data. All this framed in stages such as: Geographical Diagnosis, Platform Validation, Design and Development of the Pilot. big data, banking entity, financial sector, decisión-making, sandbox, data lake, scrum, spark. I. I ntroduction he integration from a Big Data environment in a financial institution requires management concepts, experience in developing software, data modeling and agile frameworks. The limitations discovered in a financial institution in Peru were based that the architectural pieces were installed on servers in Mexico, causing dependence on operators in that country for the administration from the Big Data environment. In addition to this, the users have too much operability rooted in their legacy systems which leads them to be resistant to change, so, in this manner, the data architecture area must provide information and knowledge to the users so that they get involved and feel satisfied with the improvements from a Big Data environment The integration from this Big Data environment was applied using the agile SCRUM framework, with a multidisciplinary team bringing to the own business users such as: Data Scientists hand in hand with the developers and data architects. The solutions that are performed on this BIG DATA environment are based on the advantages of Apache Spark as a processing engine and HDFS (Hadoop Distributed File System) as a storage tool. The analytical environments for Data Scientists which are Sandboxes, which are workspace with the possibility of running notebooks with the development in Python and Scala allowing Data Scientists to performing Machine Learning initiatives or work analytical models with different distributed processing libraries, these Sandboxes have access to consume certified and governed data hosted on the Data Lake, which is the storage component of the environment. The Data Lake contains the raw data which is storage from the different applications from the business areas; All of this raw information is mastered to be consumed by users, which means that through treatment from data, the information is collected to functional concepts called "application units", with this, these mastered data achieve value and veracity to be exploited by the different business areas Being the main stages of the integration from the Big Data environment: Geographic Diagnosis, Platform Validation, Design and Development of the Pilot. II. M ethods The implementation of the integration from the BIG DATA environment began with a pilot in the Business Development area (commercial area of the bank's products); as detailed in the following stages, first determining the scope of the pilot, going through the validation of tools to the deployment of products which validate the integration of the environment and makes it available for other analytical projects. a) Geographic Diagnosis The implementation area adopted is Business Development (or commercial area) for having several T Global Journal of Computer Science and Technology Volume XXII Issue II Version I 41 Year 2022 ( ) C © 2022 Global Journals Keywords: Author α: Professor of the Department of Computer Science, UNMSM, LIMA-PERU. e-mail: nromanc@unmsm.edu.pe Author σ: Graduated from th e Faculty of Systems Engineering and Informatics, LIMA-PERU. e-mail: jose.alberto.huapaya@gmail.com Author ρ : Professor at the Nation al University of Education, LIMA-PERU. e-mail: gmorales@une.edu.pe Author Ѡ: Professor at the Nation al University of Juliaca, LIMA-PERU. e-mail: dm.cano@unaj.edu.pe