2023
Press F11 for full screen mode

Plan

  • Two 3-dimensional Gaussian balls clustering
  • Two 32-dimensional Gaussian balls clustering
  • Autoencoder for 32-dimensional data

3D two Gaussian balls, 80000 data points

component 1 and component 3 distribution and 2d density plot.

component 1 and component 3 distribution and 2d density plot.

Gating on 2D projection

The separation line is calculated from the centers of Gaussian balls.

The separation line is calculated from the centers of Gaussian balls.

3D Gaussian balls 3d plot

3D rendering of two 3d Gaussian balls.

3D rendering of two 3d Gaussian balls.

2D UMAP plot of 3D Gaussian balls

UMAP separates two clusters better and mis-clustering rate is improved.

UMAP separates two clusters better and mis-clustering rate is improved.

32D two Gaussian balls, 80000 data points

component 1 and component 32 distribution and 2d density plot.

component 1 and component 32 distribution and 2d density plot.

32D Gaussian balls - Gating on 2D projection

Gating on 2d projection seems not possible. The calculated line is cutting the oval shape along the shorter direction. The resulting clustering mis-clustered more than 20%.

Gating on 2d projection seems not possible. The calculated line is cutting the oval shape along the shorter direction. The resulting clustering mis-clustered more than 20%.

32D Gaussian balls 3d projection

While the 3d projection shows more separation than the 2d projection, a significant overlap is still in display

While the 3d projection shows more separation than the 2d projection, a significant overlap is still in display

2D UMAP of 32D Gaussian balls

UMAP separates two clusters completely and the mis-clustering rate is negligible.

UMAP separates two clusters completely and the mis-clustering rate is negligible.

Autoencoder design

Autoencoder design. There are about 600 parameters and the latent space has the dimension of two

Autoencoder design. There are about 600 parameters and the latent space has the dimension of two

Autoencoder training

Reconstructed data

Reconstructed data shows cluster separation even in 1d or 2d projects.

Reconstructed data shows cluster separation even in 1d or 2d projects.

Reconstructed data - Gating on 2D projection

Reconstructed data gated on 2d projects. The mis-cluster rate is extremely low, which was not possible with the original data.

Reconstructed data gated on 2d projects. The mis-cluster rate is extremely low, which was not possible with the original data.

2D UMAP of reconstructed data

Two clusters in the UMAP is closer than the case of the original data. The mis-clustered rate is comparable.

Two clusters in the UMAP is closer than the case of the original data. The mis-clustered rate is comparable.

Encoded data in 2d latent space

The encoded data in the latent space with the cluster number as color code. Two clusters are clearly separated.

The encoded data in the latent space with the cluster number as color code. Two clusters are clearly separated.

–> –> –>

–> –> –> –> –> –> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –> –> –>

–> –> –> –>

–> –>