Classification of Environmental Attitudes with Artificial Intelligence Algorithms
Nigmet KOKLU a,* , Suleyman Alpaslan SULAK b
a Vocational School of Technical Sciences, Konya Technical University, Konya, Türkiye
b Ahmet Kelesoglu Educational Faculty, Necmettin Erbakan University, Konya, Türkiye
ABSTRACT
The study aims to examine people's attitudes towards the environment. Environmental education provides the necessary awareness to effectively address environmental issues. It is stated that attitudes towards the environment are very important and negative attitudes can worsen environmental problems. For this purpose, a dataset was obtained by using a scale consisting of 37 variables to a participant group consisting of 384 people. With this dataset, attitudes towards the environment have been analyzed using various classification algorithms. Logistic Regression (LR), Support Vector Machine (SVM) and Decision Tree (DT) models were used in the research. The LR, SVM, and DT models achieved 94.53%, 92.96%, and 82.55% classification success, respectively It is seen that the classification achievements of the models are at an acceptable level compared to the literature. As a result, the research sheds light on people's attitudes towards the environment through classification processes. Despite the acceptable classification achievements, alternative artificial Intelligence approaches can also be used to improve performance.
Environment is the whole of natural resources and ecosystems that constitute the common existence of humans and are necessary for their survival. Today, environmental problems have reached dangerous dimensions and have become a global crisis [1]. The environment is the totality of values that make up the common existence of people, and environmental problems have reached serious dimensions today. Among the main reasons for this are people's unconscious and selfish attitudes, unconscious practices that have emerged with the development of technology and industry [2]. Rapidly advancing with the 20th century, science and technology have increased the living standards of individuals, while at the same time causing great changes and losses in the environment. Environmental pollution has become an even more serious problem with the increase of the world population in the 21th century [3]. Global cooperation, sustainable development and the adoption of life styles that respect nature are necessary for the solution of environmental problems [4].
Environmental degradation causes negative effects that threaten the quality of life of living beings. External interventions to the ecological system lead to the deterioration of the natural balance and the emergence of various environmental problems [5]. For this reason, it is of great importance to train teachers who have a high environmental sensitivity, sufficient ecological knowledge, and who can successfully conduct theoretical and applied environmental studies. Environmental education plays an important role in this regard and aims to raise conscious individuals in the solution of environmental problems [6].
The main purpose of environmental education is to make individuals aware of their environment and to provide them with knowledge, skills, values and experience to solve environmental problems [7-9]. This education aims to provide individuals with multidimensional life skills related to themselves and their natural environment and aims to raise environmentally conscious people. Especially raising young generations with environmental awareness is critical for a sustainable future [10]. Environmental education aims to provide individuals with multidimensional life skills related to themselves and their natural environment and aims to raise people who are sensitive to the environment. The international community has begun to recognize the need for individuals to receive an effective lifelong environmental education related to the environment and environmental problems [11]. In this context, it is aimed to raise conscious individuals who will contribute to the solution of environmental problems by disseminating environmental education programs in schools and society [12].
With the increasing importance of environmental education, the development of positive attitudes and behaviors towards the environment has also come to the fore. Attitudes are the indirect observation of the integrity of an individual's feelings, thoughts and behavior towards an object [13]. Attitudes towards the environment play an important role in determining the environmental awareness and responsible behavior of individuals towards the environment. Individuals who have a negative attitude towards the environment remain insensitive to environmental problems and cause these problems to continue [14, 15].
Environmental education is necessary in order to ensure the formation of positive attitudes and values towards the environment [11, 16]. Environmental education programs should focus on developing positive attitudes towards the environment in individuals. Studies show that individuals who have received environmental education are more sensitive and solution-oriented towards environmental problems [17]. One of the most important factors that determine people's behavior is their attitude and behavior. The solution of environmental problems is possible not by technology or laws, but by changing individual behaviors. This change can be achieved through environmental education [18, 19].
Environmental education plays a critical role in developing positive attitudes and behaviors towards the environment of individuals. Individuals raised with environmental education will make important contributions to society as individuals who are sensitive to environmental problems, take an active role in solving these problems and have an awareness of protecting the environment [20]. Therefore, environmental education should be disseminated and supported at all levels and in all age groups. Research shows that environmental education programs increase individuals' awareness of the environment and help them adopt sustainable lifestyles [21, 22].
Environmental education provided from an early age creates permanent behavioral changes in individuals and enables raising an environmentally sensitive generation [23]. In addition, environmental education programs for all segments of society promote environmental awareness and active participation of large masses in environmental protection efforts [24]. In this context, the development and implementation of environmental education policies is vital for a sustainable future [25].
In this study, numerical data related to environmental attitude behavior was analyzed. Generally, the focus is on applying and analyzing various classification algorithms on the data. The flow model demonstrating the operation of the study is given in Figure 1. The dataset used, the classification algorithms employed in the study, the artificial intelligence (AI) of the algorithms, and other relevant details are listed in this section under headings
Figure 1. Schematic representation of the working process
With the literature review and knowledge accumulation conducted in the relevant field, the parametric factors necessary to create the dataset were determined. Later, a questionnaire template was created with variables that will determine the scales of environmentally responsible behavior. The environmental attitude dataset was obtained on the Internet by means of a questionnaire applied by the authors to a total of 384 people living in Turkey. The created dataset has 37 variables that can determine the state of environmentally responsible behavior. The environmental attitude dataset includes a group of 177 men and 207 women, whose ages vary between 18 and 45, as shown in Figure 2. The environmental attitude dataset included 177 men and 207 women aged between 18 and 45 years old. The fact that the study consisted of 384 people and 37 variables can be seen as a limitation.
Figure 2. Information belonging to the working group that created the dataset
The confusion matrix is used to calculate various performance metrics of a classification model, such as accuracy, accuracy, recall, and F1 score, as shown in Table 1. It is a useful tool for analyzing the strengths and weaknesses of a artificial Intelligence model and can help identify areas for improvement [26-28].
Table 1. Evaluation criteria affecting the two-class Confusion matrix
Evaluation Criteria | Definition |
True Positive (TP) | TP refers to the cases where the model correctly predicts the class and the actual class is also positive. |
False Positive (FP) | FP refers to the cases where the model predicts the class as positive, but the actual class is negative. |
True Negative (TN) | TN refers to the cases where the model correctly predicts the class as negative and the actual class is also negative. |
False Negative (FN) | FN refers to the cases where the model predicts the class as negative, but the actual class is positive. |
In order to evaluate the trained model, Precision (P), Recall (R), F-measure (F) and Accuracy (AC) metrics are used. The calculation of the metrics is performed using the four different measurements given in Figure 3 [29, 30].
Figure 3. Complexity matrix used in the classification
The preferred performance criteria in artificial intelligence algorithms are the metrics used to evaluate the performance of their models. These criteria are used to analyze how well a artificial intelligence algorithms performs a task, evaluate it, and determine the areas that need to be improved [31, 32]. There are several commonly used performance criteria:
Accuracy (AC): It measures the ratio of correctly classified examples in a dataset. It is a commonly used performance metric for classification problems [27].
(1) |
Precision (P): It measures the ratio of true positives among the examples predicted as positive. It is a commonly used performance metric in cases where false positives are costly [27, 33].
(2) |
Recall (R): t measures the rate at which true positives are correctly identified. It is a performance metric commonly used in situations where false negatives are costly [28].
(3) |
F1-Score (F): It combines sensitivity and recall rate into a single metric. It is often used in situations where both sensitivity and recall are important [28, 31].
(4) |
Cross-validation is a statistical technique used to evaluate the performance of artificial intelligence algorithms. It allows splitting the available data into two parts: a training set and a validation set. The model is then trained on the training set and tested on the validation set. The performance of the model is evaluated based on how well it performs on the validation set. The most commonly used form of cross-validation is k-fold cross-validation. The available data is divided into k equal-sized parts, as seen in Figure 4 [34, 35]. The classification model is trained on k-1 parts and tested on the remaining part. This process is repeated k times, and each part serves as the validation set once. The results obtained after each iteration are summed, and the average is taken to provide a general estimate of the model's performance. Cross-validation provides a more accurate estimate of how well the model can be applied to new data [36-40].
Figure 4. The complexity matrix used in the classification of the dataset
LR model tree is a classification model formed by combining DT and LR learning, with an associated supervised learning algorithm. It predicts a numerical value for a sample defined over a fixed set of numerical or nominal attributes. It resembles ordinary regression trees but differs in that it constructs a piecewise linear approximation to the target function [41-43]. After all nominal attributes are replaced with binary ones, an unpruned regression tree is grown using variance reduction as the splitting criterion. Once the linear models are fitted, sub-trees are accepted for replacement based on the final error prediction for each linear model. A schematic diagram of the LR model is provided in Figure 5 [42, 44, 45].
Figure 5. LR AI Model
SVM is a supervised learning algorithm commonly used to recognize patterns in complex datasets and resolve complexities [27]. It performs discriminative classification by learning through examples, as seen in Figure 6, to predict classifications of previously unseen data [46]. It is particularly employed in various fields such as text classification, image recognition, and handwritten digit recognition. The algorithm is applied with three basic input parameters: a training dataset consisting of a series of class labels for training data, and a test dataset. There are several factors contributing to the popularity of the SVM algorithm. It scales well to relatively large datasets and utilizes a wide range of function classes through what are called kernel functions [27, 47-49].
Figure 6. SVM AI Model.
DT induction algorithm has become increasingly popular in recent years. Reviews of the algorithm were conducted by Murthy in 1998 prior to its modeling. Due to its ease of implementation and relatively easy understanding compared to other classification algorithms, it is one of the commonly used algorithms for classification. Since it is constructed with given data, the value and characteristics of the data play a significant role. Before developing the DT, a target variable is determined for the suitability to be achieved by the user [50-53].
The amount of data to be used for classification, as seen in Figure 7, will also affect its outcome by altering the structure of tree construction. Typically, training data are present in a large portion of the dataset and are used to build the tree structure. The more training data collected, the higher the accuracy of the results. Depending on the test result, the data is divided into two or more subsets [54-56].
Figure 7. DT AI Model.
In this section, the classification successes and performance evaluations of LR, SVM, DT models, which are shown in Figure 1, obtained as a result of training with artificial intelligence algorithms are presented. The classification results of these models are compared. The accuracy rates of 3 different models obtained depending on the output classes of the models are given in the graph in Figure 8.
In the tables given to express the result evaluation metrics of the classification processes, the value “1” is the output classes named as positive and the value “0” is the output classes named as negative. Under the headings, confusion matrix structures are given and the relevant data are interpreted before and after each table.
When Table 3 is examined, it is seen that 158 true positive data belonging to the class labeled with the value “1” and 205 true negative data belonging to the class labeled with the value “0” are correctly classified. When TP, TN, FP and FN data are analyzed, it is seen that the classification prediction success of the model is high. According to the table, 11 data belonging to the “1” class and 10 data belonging to the “0” class were misclassified by the artificial intelligence algorithm. This is most likely due to the number of features in the input parameters. The number of incorrectly predicted data is quite low compared to the number of correctly predicted data. It can also be said that the classes are confused with each other due to the fact that the data of the classes are very close to each other.
Table 3. The confusion matrix table of the classification made by the LR model
| Actual Class | ||
1 | 0 | ||
Predicted Class | 1 | 158 | 11 |
0 | 10 | 205 |
The accuracy rates of the LR model, whose classification values are given in Table 3, were calculated according to the confusion matrix. When Figure 8 is examined, it is seen that it has a classification success rate of 94.53% accuracy. When the comparison is made by considering the literature, it can be said that the classification success of the model is high.
When Table 4 is examined, it is seen that 152 real positive data belonging to the class named with the value “1” and 205 real negative data belonging to the class named with the value “0” are classified correctly. When the TP, TN, FP and FN data are examined, it is seen that the classification prediction success of the model is high. According to the table, 17 data belonging to the “1” class and 10 data belonging to the “0” class were misclassified by the artificial intelligence algorithm. This is most likely due to the number of features in the input parameters. The number of incorrectly predicted data is low compared to the number of correctly predicted data. It can also be said that the classes are confused with each other due to the fact that the data of the classes are very close to each other.
Table 4. The confusion matrix table of the classification made by the SVM model
| Actual Class | ||
1 | 0 | ||
Predicted Class | 1 | 152 | 17 |
0 | 10 | 205 |
The accuracy rates of the SVM model, whose classification values are given in Table 4, were calculated according to the confusion matrix. When Figure 8 is examined, it is seen that it has a classification success rate of 92.96% accuracy. When the comparison is made by considering the literature, it can be said that the classification success of the model is also high.
When Table 5 is examined, it is seen that 131 true positive data belonging to the class labeled with the value “1” and 186 true negative data belonging to the class labeled with the value “0” are correctly classified. When TP, TN, FP and FN data are analyzed, it is seen that the classification prediction success of the model is not very high. According to the table, 38 data belonging to the “1” class and 29 data belonging to the “0” class were misclassified by the artificial intelligence algorithm. This is due to the number of features in the input parameters. Although the number of incorrectly predicted data is low compared to the number of correctly predicted data, it is desirable to have a lower number of incorrect predictions. In addition, it can also be said that the classes are confused with each other due to the close proximity of the data of the classes and the structure of the model.
Table 5. The confusion matrix table of the classification made by the DT model
| Actual Class | ||
1 | 0 | ||
Predicted Class | 1 | 131 | 38 |
0 | 29 | 186 |
The accuracy rates of the Bagging model, whose classification values are given in Table 5, were calculated according to the confusion matrix. When Figure 8 is examined, it is seen that it has a classification success rate of 82.55% accuracy. When a comparison is made taking into account the literature, it can be said that the classification success of the model is at an acceptable level, but not at a very high level.
The accuracy, precision, recall and F1 Score values of each model were obtained by using the confusion matrix data of all models. The classification result values are shown graphically in Table 6 and the data belonging to the models are shown graphically in Figure 8.
Table 6. Performance rates tables of the study
Algorithm | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
LR | 94.53 | 94.50 | 94.50 | 94.50 |
SVM | 92.96 | 93.00 | 93.00 | 93.00 |
DT | 82.55 | 82.50 | 82.60 | 82.50 |
Figure 8. Performance rates and graphs of the study.
When the results in Figure 8 are analyzed, it is seen that the LR model has the highest classification success. The model with the lowest classification success is DT. Again according to Table 6, in parallel with the classification success values, the model with the highest values in metrics such as precision, recall and f1-score is the LR model. Again, the model with the lowest values in these metrics is the DT model.
In this study, classification processes were carried out with different artificial intelligence algorithms in order to determine the sensory tendencies of people towards the environment. There are 37 features and two classes in the dataset used. Using the data obtained by us from 384 people, classification processes were performed with LR, SVM and DT algorithms. Cross validation algorithms was used for training and testing. According to the accuracy values in Figure 8, the LR model achieved 94.53% classification success, SVM model achieved 92.96% classification success and DT model achieved 82.55% classification success. It is seen that the classification success of the models is acceptable when compared with the literature.
In order to achieve higher classification success, the number of data in the dataset can be increased. In addition, equal distribution of data between classes will also increase the classification success. In addition, the use of different artificial intelligence algorithms will provide changes in classification success.
This study contributes significantly to the existing body of knowledge in several aspects. Methodologically, it demonstrates the effective utilization of artificial intelligence algorithms in assessing environmental sensitivities. The performance of different algorithms has been compared to determine their suitability. The obtained high classification success rates strongly advocate for the effective application of such methods in domains such as environmental sensitivity analysis.
These findings hold potential implications for the development of environmental policies and educational programs. Specifically, they can inform the design of strategies aimed at enhancing environmental awareness and sustainability practices.
Moreover, the results underscore that increasing the size of datasets and ensuring balanced class distributions can notably improve classification success rates. This suggests avenues for future research to explore how the choice of artificial intelligence algorithms influences classification outcomes.
In conclusion, this research highlights the promise of artificial intelligence techniques in the analysis of environmental sensitivities and underscores their relevance across various research domains.
Data Availability
The data used to support the findings of this study are available on the
https://nigmetkoklu.com/dataset/environmental_attitude_dataset.zip
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Ethics statement
This is to confirm that the relevant informed consent was obtained from the respondents. During the distribution of questionnaire, respondents were explicitly informed of the voluntary nature of their participation. They were also made aware that the data provided would be exclusively used for academic purposes, specifically for the dissemination of findings through publication in scholarly journals. Furthermore, prior to initiating the data collection process, ethical approval from the Necmettin Erbakan University Social and Behavioral Sciences Institutional Review Board Ethics Committee was granted. The ethical approval bears the protocol number 2023-02.
References