ϟ

A. Kasem

Here are all the papers by A. Kasem that you can download and read on OA.mg.
A. Kasem’s last known institution is . Download A. Kasem PDFs here.

Claim this Profile →
DOI: 10.1145/3310986.3311023
2019
Cited 55 times
Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification
A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.
DOI: 10.21608/mjmr.2024.267604.1658
2024
Study of mineral and bone disorders in Minia University Hospital dialysis patients
DOI: 10.1145/3206025.3206047
2018
Cited 14 times
A Context-Aware Late-Fusion Approach for Disaster Image Retrieval from Social Media
Natural disasters, especially those related to flooding, are global issues that attract a lot of attention in many parts of the world. A series of research ideas focusing on combining heterogeneous data sources to monitor natural disasters have been proposed, including multi-modal image retrieval. Among these data sources, social media streams are considered of high importance due to the fast and localized updates on disaster situations. Unfortunately, the social media itself contains several factors that limit the accuracy of this process such as noisy data, unsynchronized content between image and collateral text, and untrusted information, to name a few. In this research work, we introduce a context-aware late-fusion approach for disaster image retrieval from social media. Several known techniques based on context-aware criteria are integrated, namely late fusion, tuning, ensemble learning, object detection and scene classification using deep learning. We have developed a method for image-text content synchronization and spatial-temporal-context event confirmation, and evaluated the role of using different types of features extracted from internal and external data sources. We evaluated our approach using the dataset and evaluation tool offered by MediaEval2017: Emergency Response for Flooding Events Task. We have also compared our approach with other methods introduced by MediaEval2017's participants. The experimental results show that our approach is the best one when taking the image-text content synchronization and spatial-temporal-context event confirmation into account.
DOI: 10.1016/j.jsc.2010.10.007
2011
Cited 12 times
Morley’s theorem revisited: Origami construction and automated proof
Morley's theorem states that for any triangle, the intersections of its adjacent angle trisectors form an equilateral triangle.The construction of Morley's triangle by the straightedge and compass is impossible because of the well-known impossibility result of the angle trisection.However, by origami, the construction of an angle trisector is possible, and hence of Morley's triangle.In this paper we present a computational origami construction of Morley's triangle and automated correctness proof of the generalized Morley's theorem.During the computational origami construction, geometrical constraints in symbolic representation are generated and accumulated.Those constraints are then transformed into algebraic forms, i.e. a set of polynomials, which in turn are used to prove the correctness of the construction.The automated proof is based on the Gröbner bases method.The timings of the experiments of the Gröbner bases computations for our proofs are given.They vary greatly depending on the origami construction methods, algorithms for Gröbner bases computation, and variable orderings.
DOI: 10.5220/0006749106850691
2018
Cited 11 times
HealthyClassroom - A Proof-of-Concept Study for Discovering Students’ Daily Moods and Classroom Emotions to Enhance a Learning-teaching Process using Heterogeneous Sensors
This paper introduces an interactive system that discovers students’ daily moods and classroom emotions to enhance the teaching and learning process using heterogeneous sensors. The system is designed to enable (1) detecting students daily moods and classroom emotions using physiological, physical activities, and event tags data coming from wristband sensors and smart-phones, (2) discovering association/correlation between students’ lifestyle and daily moods, and (3) displaying statistical reports and the distribution of daily moods and classroom emotions of students, both in individual and group modes. A pilot proof-of-concept study was carried out using Empatica E4 wristband sensors and Android smart-phones, and preliminary evaluation and findings showing promising results are reported and discussed.
DOI: 10.1007/s00521-020-05570-7
2021
Cited 8 times
A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH)
DOI: 10.1007/s11704-008-0009-8
2008
Cited 14 times
Computational origami environment on the web
DOI: 10.1007/11832225_36
2006
Cited 12 times
Computational Construction of a Maximum Equilateral Triangle Inscribed in an Origami
We present an origami construction of a maximum equilateral triangle inscribed in an origami, and an automated proof of the correctness of the construction. The construction and the correctness proof are achieved by a computational origami system called Eos (E-origami system). In the construction we apply the techniques of geometrical constraint solving, and in the automated proof we apply Gröbner bases theory and the cylindrical algebraic decomposition method. The cylindrical algebraic decomposition is indispensable to the automated proof of the maximality since the specification of this property involves the notion of inequalities. The interplay of construction and proof by Gröbner bases method and the cylindrical algebraic decomposition supported by Eos is the feature of our work.
DOI: 10.1145/1982185.1982429
2011
Cited 8 times
Origami axioms and circle extension
Origami, i.e. paper folding, is a powerful tool for geometrical constructions. In 1989, Humiaki Huzita introduced six folding operations based on aligning one or more combinations of points and lines [6]. Jacques Justin, in his paper of the same proceedings, also presented a list of seven distinct operations [9]. His list included, without literal description, one extra operation not in Huzita's paper. Justin's work was written in French, and was somehow unknown among researchers. This led Hatori [5] to 'discover' the same seventh operation in 2001. Alperin and Lang in 2006 [1] showed, by exhaustive enumeration of combinations of superpositions of points and lines involved, that the seven operations are complete combinations of the alignments. Huzita did not call his list of operations axioms. However, over years, the term Huzita axioms, or Huzita-Justin or Huzita-Hatori axioms, has been widely used in origami community. From logical point of view, it is not accurate to call Huzita's original statements of folding operations as axioms, because they are not always true in plane Euclidean geometry. In this paper, we present precise statements of the folding operations, by which naming them 'axioms' is logically valid, and we make some notes about the work of Huzita and Justin.
DOI: 10.1007/978-3-319-48517-1_14
2016
Cited 6 times
Empirical Study of Sampling Methods for Classification in Imbalanced Clinical Datasets
Many clinical data suffer from data imbalance in which we have large number of instances of one class and small number of instances of the other. This problem affects most machine learning algorithms especially decision trees. In this study, we investigated different undersampling and oversampling algorithms applied to multiple imbalanced clinical datasets. We evaluated the performance of decision tree classifiers built for each combination of dataset and sampling method. We reported our experiment results and found that the considered oversampling methods generally outperform undersampling ones using AUC performance measure.
DOI: 10.1145/1244002.1244173
2007
Cited 10 times
Logical and algebraic view of Huzita's origami axioms with applications to computational origami
We describe Huzita's origami axioms from the logical and algebraic points of view. Observing that Huzita's axioms are statements about the existence of certain origami constructions, we can generate basic origami constructions from those axioms. Origami construction is performed by repeated application of Huzita's axioms. We give the logical specification of Huzita's axioms as constraints among geometric objects of origami in the language of the first-order predicate logic. The logical specification is then translated into logical combinations of algebraic forms, i.e. polynomial equalities, disequalities and inequalities, and further into polynomial ideals (if inequalities are not involved). By constraint solving, we obtain solutions that satisfy the logical specification of the origami construction problem. The solutions include fold lines along which origami paper has to be folded. The obtained solutions both in numeric and symbolic forms make origami computationally tractable for further treatments, such as visualization and automated theorem proving of the correctness of the origami construction.
DOI: 10.1007/978-3-642-40672-0_10
2013
Cited 6 times
Algebraic Analysis of Huzita’s Origami Operations and Their Extensions
We investigate the basic fold operations, often referred to as Huzita’s axioms, which represent the standard seven operations used commonly in computational origami. We reformulate the operations by giving them precise conditions that eliminate the degenerate and incident cases. We prove that the reformulated ones yield a finite number of fold lines. Furthermore, we show how the incident cases reduce certain operations to simpler ones. We present an alternative single operation based on one of the operations without side conditions. We show how each of the reformulated operations can be realized by the alternative one. It is known that cubic equations can be solved using origami folding. We study the extension of origami by introducing fold operations that involve conic sections. We show that the new extended set of fold operations generates polynomial equations of degree up to six.
DOI: 10.5220/0005555503930400
2015
Cited 6 times
Towards Gamification in Software Traceability: Between Test and Code Artifacts
With the ever-increasing dependence of our civil and social infrastructures to the correct functioning of software systems, the need for approaches to engineer reliable and validated software systems grows rapidly. Traceability is the ability to trace the influence of one software artifact on another by linking dependencies. Test-to-code traceability (relationships between test and system code) plays a vital role in the production, verification, reliability and certification of highly software-intensive dependable systems. Prior work on test-to-code traceability in contemporary software engineering environments and tools is not satisfactory and is limited with respect to the need regarding results accuracy, lack of motivation, and high required effort by developers/testers. This paper argues that a new research is necessary to tackle the above weaknesses. Thus, it advocates for the induction of gamification concepts in software traceability, and takes a position that the use of gamificaiton metrics can contribute to software traceability tasks in validating software and critical systems. We propose a research agenda to execute this position by providing a unifying foundation for gamified software traceability that combines self-adaptive, visualization, and predictive features for trace links.
DOI: 10.1055/a-1946-0157
2022
Cited 3 times
The Development and Validation of Artificial Intelligence Pediatric Appendicitis Decision-Tree for Children 0 to 12 Years Old
Abstract Introduction Diagnosing appendicitis in young children (0–12 years) still poses a special difficulty despite the advent of radiological investigations. Few scoring models have evolved and been applied worldwide, but with significant fluctuations in accuracy upon validation. Aim To utilize artificial intelligence (AI) techniques to develop and validate a diagnostic model based on clinical and laboratory parameters only (without imaging), in addition to prospective validation to confirm the findings. Methods In Stage-I, observational data of children (0–12 years), referred for acute appendicitis (March 1, 2016–February 28, 2019, n = 166), was used for model development and evaluation using 10-fold cross-validation (XV) technique to simulate a prospective validation. In Stage-II, prospective validation of the model and the XV estimates were performed (March 1, 2019–November 30, 2021, n = 139). Results The developed model, AI Pediatric Appendicitis Decision-tree (AiPAD), is both accurate and explainable, with an XV estimation of average accuracy to be 93.5% ± 5.8 (91.4% positive predictive value [PPV] and 94.8% negative predictive value [NPV]). Prospective validation revealed that the model was indeed accurate and close to the XV evaluations, with an overall accuracy of 97.1% (96.7% PPV and 97.4% NPV). Conclusion The AiPAD is validated, highly accurate, easy to comprehend, and offers an invaluable tool to use in diagnosing appendicitis in children without the need for imaging. Ultimately, this would lead to significant practical benefits, improved outcomes, and reduced costs.
DOI: 10.1007/978-3-030-03302-6_16
2018
Cited 6 times
A Data Mining Approach for Inventory Forecasting: A Case Study of a Medical Store
One of the factors that often result in an unforeseen shortage or expiry of medication is the absence of, or continued use of ineffective, inventory forecasting mechanisms. Unforeseen shortage of perhaps lifesaving medication potentially translates to a loss of lives, while overstocking can affect both medical budgeting as well as healthcare provision. Evidence from literature indicates that forecasting techniques can be a robust approach to address this inventory management challenge. The purpose of this study is to propose an inventory forecasting solution based on time series data mining techniques applied to transactional data of medical consumptions. Four different machine learning algorithms for time series analysis were explored and their forecasting accuracy estimates were compared. Results reveal that Gaussian Processes (GP) produced better results compared to other explored techniques (Support Vector Machine Regression (SMOreg), Multilayer Perceptron (MLP) and Linear Regression (LR)) for four weeks ahead prediction. The proposed solution is based on secondary data and can be replicated or altered to suit different constraints of other medical stores. Therefore, this work evidently suggests that the use of data mining techniques could prove a feasible solution to a prevalent challenge in medical inventory forecasting process. It also outlines the steps to be taken in this process and proposes a method to estimate forecasting risk that helps in deploying obtained results in the respective domain area.
DOI: 10.1007/978-3-642-25070-5_5
2011
Cited 5 times
Proof Documents for Automated Origami Theorem Proving
A proof document for origami theorem proving is a record of entire process of reasoning about origami construction and theorem proving. It is produced at the completion of origami theorem proving as a kind of proof certificate. It describes in detail how the whole process of an origami construction and the subsequent theorem proving are carried out in our computational origami system. In particular, it describes logical and algebraic transformations of the prescription of origami construction into mathematical models that in turn become amenable to computation and verification. The structure of the proof document is detailed using an illustrative example that reveals the importance of such a document in the analysis of origami construction and theorem proving.
DOI: 10.1103/physrevd.98.095019
2018
Cited 4 times
Long-lived <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mi>B</mml:mi><mml:mo>−</mml:mo><mml:mi>L</mml:mi></mml:math> symmetric SSM particles at the LHC
We investigate the collider signatures of neutral and charged Long-Lived Particles (LLPs), predicted by the Supersymmetric $B-L$ extension of the Standard Model (BLSSM), at the Large Hadron Collider (LHC). The BLSSM is a natural extension of the Minimal Supersymmetric Standard Model (MSSM) that can account for non-vanishing neutrino masses. We show that the lightest right-handed sneutrino can be the Lightest Supersymmetric Particle (LSP), while the Next-to-the LSP (NLSP) is either the lightest left-handed sneutrino or the left-handed stau, which are natural candidates for the LLPs. We analyze the displaced vertex signature of the neutral LLP (the lightest left-handed sneutrino), and the charged tracks associated with the charged LLP (the left-handed stau). We show that the production cross sections of our neutral and charged LLPs are relatively large, namely of order ${\cal O}(1)~{\rm fb}$. Thus, probing these particles at the LHC is quite plausible. In addition, we find that the displaced di-lepton associated with the lightest left-handed sneutrino has a large impact parameter that discriminates it from other SM leptons. We also emphasize that the charged track associated with the left-handed stau has a large momentum with slow moving charged tracks, hence it is distinguished from the SM background and therefore it can be accessible at the LHC.
DOI: 10.1007/978-3-030-03302-6_19
2018
Cited 3 times
Implementation of Low-Cost 3D-Printed Prosthetic Hand and Tasks-Based Control Analysis
A functional prosthetic hand can cost up to £10,000 which limits its access to many amputees. With advancement of technology, one of the possibilities to overcome this issue lays in the use of 3D-printing. A 3D-printer can reduce the production cost significantly, to less than £400, for models that can achieve basic functionalities. There have been several developments of 3D-printed prosthetic hands and arms, and some of them have been made open source. This paper presents a work in progress of implementing a 3D-printed prosthetic hand based on an open source model, describes some of the important issues and challenges faced, and carries out a tasks-based control analysis for some activities of daily living; namely those that depend on power, tip, lateral, and spherical grasps.
DOI: 10.3303/cet2183060
2021
Cited 3 times
Convolution Recurrent Neural Network for Daily Forecast of PM10 Concentrations in Brunei Darussalam
PM10 is a particulate matter with an aerodynamic diameter less than or equal to 10. It is one of the primary pollutants contributing to the ambient air quality level. Air quality monitoring in Brunei Darussalam is using only the PM10 concentrations to measure the nation's daily Pollutant Standard Index (PSI). This study sheds light on a data centric landscape of air pollution prediction in Brunei Darussalam, highlights potential uses of forecasting daily PM10concentrations, and presents comparisons of prediction models built using several methods, namely: moving average, linear regression, recurrent neural network (RNN), long short term memory (LSTM), LSTM with 1D convolutions, and convolutional recurrent neural network (CRNN). This study is using daily PM10 concentrations obtained from the air quality monitoring stations located at every district in Brunei Darussalam for a period of 15 y (2005-2019).
DOI: 10.1063/5.0110448
2023
An automatic photometric augmentation technique to recognize faces with single sample per person
Views Icon Views Article contents Figures & tables Video Audio Supplementary Data Peer Review Share Icon Share Twitter Facebook Reddit LinkedIn Tools Icon Tools Reprints and Permissions Cite Icon Cite Search Site Citation Muhammad Tariq Siddique, Ibrahim Venkat, Asem Kasem, Sharul Tazrajiman; An automatic photometric augmentation technique to recognize faces with single sample per person. AIP Conference Proceedings 10 January 2023; 2643 (1): 040011. https://doi.org/10.1063/5.0110448 Download citation file: Ris (Zotero) Reference Manager EasyBib Bookends Mendeley Papers EndNote RefWorks BibTex toolbar search Search Dropdown Menu toolbar search search input Search input auto suggest filter your search All ContentAIP Publishing PortfolioAIP Conference Proceedings Search Advanced Search |Citation Search
DOI: 10.1016/j.dark.2023.101336
2023
Primordial gravitational waves in generalized Palatini gravity
Extended Palatini gravity is the metric-affine gravity theory characterized by zero torsion, nonzero metricity and a quadratic of the antisymmetric Ricci curvature. It reduces dynamically to general relativity plus a geometric Proca field. In this work, we study imprints of the geometric Proca field on the gravitational waves. Our results show that the geometric Proca leaves significant signatures in the gravitational wave signal, and gravitational wave energy density could be large enough to be detectable by the next upgrade of the existing GW detectors. Our results, if confirmed observationally, will be an indication that the gravity could be non-Riemannian in nature.
DOI: 10.1055/s-0043-1776976
2023
Erratum to: The Development and Validation of Artificial Intelligence Pediatric Appendicitis Decision-Tree for Children 0 to 12 Years Old
Correction to: The Development and Validation of Artificial Intelligence Pediatric Appendicitis Decision-Tree for Children 0 to 12 Years OldEur J Pediatr Surg 2023; 33(05): 395-402DOI: 10.1055/a-1946-0157
DOI: 10.1007/s00431-023-05390-6
2023
AI-augmented clinical decision in paediatric appendicitis: can an AI-generated model improve trainees’ diagnostic capability?
DOI: 10.1109/icca59364.2023.10401372
2023
Eye Tracking System Using Convolutional Neural Network
Several diseases and disabilities, including cerebral palsy (CP), cause patients to lose their ability to communicate easily with the outside world. In this paper we aim to detect the movement of the iris that can be used to move a wheelchair for CP patients, write sentences by simply looking at an interactive screen, or even by moving an artificial arm. Thus, the two-dimensional CNN model and the coding method, we proposed, tracks the eye movement, specifically the movement of the iris, using VGG image annotator software and a dataset of several people’s eyes images A dataset of 500 images from several patients, where only the eyes were cropped and studied, was taken to train the model about eye movement and detection. The proposed model achieved a mean squared error drop of about 49,720 from the original we started with, which was 50,000.
2018
Gaining Insights on Nasopharyngeal Carcinoma Treatment Outcome Using Clinical Data Mining Techniques.
The analysis of Electronic Health Records (EHRs) is attracting a lot of research attention in the medical informatics domain. Hospitals and medical institutes started to use data mining techniques to gain new insights from the massive amounts of data that can be made available through EHRs. Researchers in the medical field have often used descriptive statistics and classical statistical methods to prove assumed medical hypotheses. However, discovering new insights from large amounts of data solely based on experts' observations is difficult. Using data mining techniques and visualizations, practitioners can find hidden knowledge, identify interesting patterns, or formulate new hypotheses to be further investigated. This paper describes a work in progress on using data mining methods to analyze clinical data of Nasopharyngeal Carcinoma (NPC) cancer patients. NPC is the fifth most common cancer among Malaysians, and the data analyzed in this study was collected from three states in Malaysia (Kuala Lumpur, Sabah and Sarawak), and is considered to be the largest up-to-date dataset of its kind. This research is addressing the issue of cancer recurrence after the completion of radiotherapy and chemotherapy treatment. We describe the procedure, problems, and insights gained during the process.
DOI: 10.1109/icaccaf.2018.8776690
2018
Learning Analytics in Universiti Teknologi Brunei: Predicting Graduates Performance
Research in Educational Data Mining has enabled many applications that positively impacted teaching, learning, and their management process. This study uses data mining techniques to study the performance of full-time undergraduate students in School of Computing and Informatics, in Universiti Teknologi Brunei (UTB). Two aspects of students' performance have been focused on. First, predicting undergraduates' performance at an early stage of their study program. Second, identifying modules that can serve as strong indicators of performance at the end of the degree program. We have collected data on students' academic performance throughout the four years of their study, starting from 2009, as well as related demographic and background information. We present the approach we have taken to answer two research questions identified for this study. Several classification techniques and sampling methods, to overcome data imbalance challenges, have been experimented with. Despite the small data size available, we achieved reasonable accuracy in predicting the three graduation classifications adopted in UTB (average true positive rate of 0.754). This was using Naïve Bayes method with Feature Selection technique based on Gain Ratio attribute evaluator. In overall, modules in semesters 2 to 4 are more prominent than modules of first semester in serving as strong predictors. We also draw some conclusions from insights we observed from the best Decision Tree model.
2018
Leveraging Content and Context in Understanding Activities of Daily Living.
DOI: 10.1007/978-981-33-4069-5_21
2021
Split Balancing (sBal)—A Data Preprocessing Sampling Technique for Ensemble Methods for Binary Classification in Imbalanced Datasets
The problem of class imbalance in machine learning occurs when there is a relatively big disproportional distribution of classes in the data for classification tasks. In many real-world domains, such as in healthcare, finance, and predictive maintenance, the number of data points of a less important class (usually the negative class) is much higher than the class of greater interest (usually the positive or target class). This affects the ability of many learning algorithms to find good classification models. To address that, many approaches for solving this problem have been proposed, prominently including ensemble methods integrated with sampling-based techniques. However, these methods are still prone to the negative effects of sampling-based techniques that alter class distributions via over-sampling or under-sampling, which can lead to overfitting or discarding useful data, respectively, and thus affect performance. In this paper, we propose a new data preprocessing sampling technique dubbed as (sBal) for ensemble methods for binary classification in the case of imbalanced datasets. Our proposed method first turns the imbalanced dataset into several balanced bins/bags. Then multiple base learners are induced on the balanced bags and finally, the classification results are combined using a specific ensemble rule. We evaluated the performance of our proposed method on 50 imbalanced real-world binary datasets and compared its performance with well-known ensemble methods that utilize data preprocessing techniques namely SMOTEBagging, SMOTEBoost, RUSBoost, and RAMOBoost. The results reveal that the proposed method brings considerable improvement in classification performance relevant to the compared methods. We performed statistical significance analysis using Friedman’s non-parametric statistical test with Bergman post-hoc test. The analysis showed that our method performed significantly better than the majority of the methods across many datasets, suggesting a better preprocessing approach than the ones used in compared methods. We also highlight possible extensions to the method that can improve its effectiveness.
DOI: 10.21123/bsj.2014.11.3.1361-1366
2014
Comparison the effectiveness of using a magnetic field to control theColi phages isolated from rivulet water with water-treatment using magnetic field added iron filings
The present study aimed to use the magnetic field and nanotechnology in the field of water purification, which slots offering high efficiency to the possibility of removing biological contaminants such as viruses and bacteria rather than the use of chemical and physical transactions such as chlorine and bromine, and ultraviolet light and boiling and sedimentation and distillation, ozone and others that have a direct negative impact on human safety and the environment. Where they were investigating the presence in water samples under study Coli phages using Single agar layer method and then treated samples positive for phages to three types of magnetic field fixed as follows (North Pole - South Pole - Bipolar) and compare the results with samples of water treatment in the same conditions of the magnetic field with the addition of powder iron filings stated the results to the act synergistic to use a magnetic field with iron filling efficiency up to 100% better than the use of a magnetic field alone, where the disappearance of the emergence of spots Plaques treatment in the second. thus enhancing the possibility of making iron pipe nanoparticles to pump water in the treatment of water due to the high surface area (surface / volume). It is suggested that this could be used in the future in a wide range of water purification may be the best option for waste water treatment.
2011
Studies on computational origami : Web environment and extension of foldability
2007
Symposium on Multiagent Systems, Robotics and Cybernetics: Theory and Practice
2019
二値分類のための不均衡データ集合上の機械学習アルゴリズムを評価するためのROC曲線(AUC)とMathew相関係数(MCC)の下での面積の経験的比較【JST・京大機械翻訳】
2019
Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification
2006
Logical and Algebraic Formulation of Origami Axioms
We describe Huzita’s origami axioms in logical and algebraic point of view. Observing that Huzita’s axioms are statements about the existence of certain origami constructions, we can generate basic origami constructions from those axioms. We give the logical specification of Huzita’s axioms as constraints among geometric objects of origami in the language of the first-order predicate logic. The logical specification is then translated into logical combinations of algebraic forms, i.e. polynomial equalities, disequalities and inequalities, and further into polynomial ideals (if inequalities are not involved). Origami construction is performed by repeated application of Huzita’s axioms. By constraint solving, we obtain solutions that satisfy the logical specification of the origami construction problem. The solutions include fold lines along which origami has to be folded. The obtained solutions both in numerical and symbolic forms make origami computationally tractable for further treatment, such as visualization and automated theorem proving of the correctness of the origami construction.
DOI: 10.1007/978-3-030-68133-3_11
2021
A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise
Noise and class imbalance are two common data characteristics related to the quality and nature of many real-world data sources, which usually negatively affect the performance of many machine learning classification algorithms. A wide range of studies have investigated the problem of class imbalance and noise in isolation, and very few of them have studied their combined effect. In this paper, we propose a robust bagging-based ensemble method that tries to pay attention to both problems combined. The proposed method is based on the idea of Balanced Bagging (BB) to balance the bootstraps, but with a different sampling process, in which the probability of selecting an instance will be based on its level of hardness, i.e. the probability of an instance being misclassified irrespective of the choice of the classifier. The approach of the proposed method is based on estimating the hardness of each instance in a training dataset, and ensuring that bootstraps are balanced, and at the same time have instances of varying degrees of hardness (Easy, Normal, and Hard). We evaluate the performance of the proposed method on 30 synthetic imbalanced datasets with different levels of noise and imbalance ratios and compare its performance against the BB method. We observe that the proposed method performs significantly better than BB regardless of the noise level or imbalance ratio. Furthermore, we calculate the Equalized Loss of Accuracy (ELA) to assess the robustness of both methods under different levels of noise. The results indicate that the proposed method is more robust (not affected by noise as much) compared to BB. The Wilcoxon signed rank statistical test shows that there is a significant difference in both, performance and robustness, between the proposed method and BB, suggesting that representing varying levels of hardness in bootstraps is a better bootstrapping approach that improves the performance of ensemble methods.