Mathematical programming models for unsupervised learning and their applications in the clustering of brazilian school data

Authors

  • Victor Augusto do Carmo Duarte Laboratório Nacional de Computação Científica (LNCC), Programa de Pós-Graduação em Modelagem Computacional, Petrópolis, RJ, Brasil https://orcid.org/0009-0005-6807-7500
  • Erito Marques de Souza Filho Universidade Federal Fluminense (UFF), Programa de Pós-Graduação em Ciências Cardiovasculares, Niterói, RJ, Brasil https://orcid.org/0000-0002-0381-3344

DOI:

https://doi.org/10.35819/remat2025v11id7421

Keywords:

unsupervised learning, clustering, binary integer programming, mixed-integer linear programming, educational data

Abstract

The analysis of educational data is essential for understanding the performance of educational institutions and identifying areas for improvement. In this context, data clustering is a widely used tool, particularly with algorithms modeled as mathematical programming problems. This paper proposes the use and implementation of three unsupervised learning algorithms, modeled with Binary Integer Programming and Mixed-Integer Linear Programming, for clustering data on the average performance of Brazilian schools in the National High School Exam, published by the National Institute for Educational Studies and Research Anísio Teixeira. The aim is to validate the models by investigating the characteristics of the institutions in each cluster, comparing their Socio-Economic Level Indicator and their administrative dependence with their school performance. The results found point to the superior performance of federal public schools and private schools when compared to municipal and state public schools.

Downloads

Download data is not yet available.

Author Biographies

  • Victor Augusto do Carmo Duarte, Laboratório Nacional de Computação Científica (LNCC), Programa de Pós-Graduação em Modelagem Computacional, Petrópolis, RJ, Brasil
  • Erito Marques de Souza Filho, Universidade Federal Fluminense (UFF), Programa de Pós-Graduação em Ciências Cardiovasculares, Niterói, RJ, Brasil

References

ÁGOSTON, Kolos Cs.; E.-NAGY, Marianna. Mixed integer linear programming formulation for K-means clustering problem. Central European Journal of Operations Research, v. 32, n. 1, p. 11–27, 2023. DOI: https://doi.org/10.1007/s10100-023-00881-1.

AWASTHI, Pranjal; BANDEIRA, Afonso S.; CHARIKAR, Moses; KRISHNASWAMY, Ravishankar; VILLAR, Soledad; WARD, Rachel. Relax, no need to round: integrality of clustering formulations. In: PROCEEDINGS OF THE 2015 CONFERENCE ON INNOVATIONS IN THEORETICAL COMPUTER SCIENCE. New York, NY, USA: Association for Computing Machinery, 2015. p. 191-200. DOI: https://doi.org/10.1145/2688073.268811.

BISSCHOP, Johannes. AIMMS: Optimization Modelling. [S.l.]: AIMMS B.V, 2006.

BRASIL. Microdados. Brasília, DF: Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira, 2024. Disponível em: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados. Acesso em: 12 maio 2024.

FONSECA, Stella; NAMEN, Anderson. Mineração em bases de dados do INEP: uma análise exploratória para nortear melhorias no sistema educacional brasileiro. Educação em Revista, v. 32, p. 133–157, jan. 2016. DOI: https://doi.org/10.1590/0102-4698140742.

FRANCELINO, Wander; MACHADO, Lucas. Mineração de dados nos microdados Enade computação. Tubarão: Repositório Universitário de Ânima, 2020. Disponível em: https://repositorio.animaeducacao.com.br/handle/ANIMA/8471. Acesso em: 12 maio 2024.

MAIA, Marília Magalhães; ANDRADE, Luiza Helena Felix de; FERNANDES, Silvio. K-means na análise de características socioeconômicas de candidatos ao ensino superior. In: ENCONTRO DE COMPUTAÇÃO DO OESTE POTIGUAR, 5., 2021, Pau dos Ferros, RN. Anais [...]. Pau dos Ferros, RN: UFERSA, 2021. Disponível em: https://periodicos.ufersa.edu.br/ecop/article/view/11168/10877. Acesso em: 13 maio 2024.

MASCHIO, Pedro de Torres; VIEIRA, Marcos Alves; COSTA, Newarney Torrezão da; MELO, Sara Luzia de; PEREIRA JUNIOR, Cleon Xavier. Um panorama acerca da mineração de dados educacionais no Brasil. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO, 29., 2018, Fortaleza. Anais [...]. [S.l.]: SBC, 2018. p. 1936–1940. Disponível em: http://milanesa.ime.usp.br/rbie/index.php/sbie/article/viewFile/8194/5873. Acesso em: 2 maio 2025.

NUEDA, María; GANDÍA, Carmen; MOLINA, Mariola. LPDA: A new classification method based on linear programming. PLoS ONE, v. 17, n. 7, jul. 2022. DOI: https://doi.org/10.1371/journal.pone.0270403.

SAGLAM, Burcu; SALMAN, Sibel; SAYIN, Serpil; TÜRKAY, Metin. A mixed-integer programming approach to the slustering problem with an application in customer segmentation. European Journal of Operational Research, v. 173, n. 3, p. 866–879, set. 2006. DOI: https://doi.org/10.1016/j.ejor.2005.04.048.

WERNER, Hanna. K-means Clustering as a Mixed Integer Programming Problem. 2022. Degree Project (Technology) – Stockholm, Sweden. Disponível em: https://www.diva-portal.org/smash/get/diva2:1673547/FULLTEXT01.pdf. Acesso em: 2 maio 2025.

Published

2025-05-05

Issue

Section

Mathematics

How to Cite

DUARTE, Victor Augusto do Carmo; SOUZA FILHO, Erito Marques de. Mathematical programming models for unsupervised learning and their applications in the clustering of brazilian school data. REMAT: Revista Eletrônica da Matemática, Bento Gonçalves, RS, Brasil, v. 11, p. e301, 2025. DOI: 10.35819/remat2025v11id7421. Disponível em: https://periodicos.ifrs.edu.br/index.php/REMAT/article/view/7421. Acesso em: 11 jun. 2026.