Global Journal of Human-Social Science, B: Geography, Environmental Science and Disaster Management, Volume 22 Issue 3

Centroid linkage The distance between two clusters is calculated as the distance between their centroids. Each cluster is represented by its centroid. Ward method The distance between two clusters is calculated by summing the squared deviations of each object from the centroid of its own cluster, joining two clusters that result in the smallest increase in the sum of squares of the total error within the group, also called the least variance method. Unlike the other methods, brings together groups that do not dramatically increase heterogeneity, in this way, it unifies the groups so that the variation within these groups is minimized, groups created as homogeneous as possible. Note: developed on the basis of Frank and Todeschini (1994) and Härdle and Simar (2015). Another important detail for cluster analysis includes the standardization of the variables, since most cluster analyses using distance measurements are very sensitive to different scales or magnitudes between variables. In general, variables with higher dispersion (higher standard deviations) have a greater impact on the final similarity value. The most common form of standardization is the conversion of each variable into standard scores (Z scores) by subtraction of the mean and division by the standard deviation for each variable. The process converts each initial data score into a standardized value with an average of 0 and a standard deviation of 1, eliminating the bias that is introduced by the differences in the scales of the various attributes or variables used in the analysis (Hair et al., 2009). The result of Cluster Analysis can be presented in the form of a graph, called dendrogram, where the observations, the sequence of the clusters and the distances between the clusters are presented. Hastie, Tibshirani and Friedman (2009) state that a dendrogram provides a complete interpretative description of the hierarchical cluster in a graphical format and that this is one of the main reasons for the popularity of this clustering method. The main objective of this article is to analyze the clustering of fine-grained tropical soils in relation to their geotechnical properties in association with physical, mechanical, chemical and mineralogical characteristics, using data science tools (the Python programming language). In addition to the analysis and discussion of the hierarchical cluster dendrogram, the article compares microscopic images of soils in order to identify the similarity between them in mineralogical terms, verifying the similarity of the comparison with the result of the cluster analysis. II. M aterials and M ethods a) Tropical Soils Studied Thirteen fine-grained soils were studied (with maximum of 10% of retained material in the no. 10 sieve - with a 2.0 mm opening, according to the criteria of the MCT Methodology: M - Miniature, C - Compacted, T - Tropical), collected in horizon B of road or deposit areas available in the Metropolitan Region of Recife, which is composed of 15 municipalities (including the capital – Recife), as indicated in Figure 1. The MCT Methodology, created by Job Shuji Nogami and Douglas Fadul Villibor in 1980, allows the initial classification of soils into two large groups (lateritic behavior - L and Non-lateritic - N) and the categorization of these into the classes: LA - Lateritic Sand; LA' - Lateritic Sandy Soil; LG' – Lateritic Clay soil; NA - Non- Lateritic Sand; NA' - Non-Lateritic Sandy Soil; NS' – Non- Lateritic Silty Soil and NG' – Non-Lateritic Clay soil, more detailed in Villibor and Nogami (2009). © 2022 Global Journals Volume XXII Issue III Version I 11 ( ) Global Journal of Human Social Science - Year 2022 B Clustering of Fine-Grained Tropical Soils using Data Science Tools Applied to their Geotechnical Properties