INDEX
Negative Logits
DEV
0.40
env
0.39
跑步
0.38
亙
0.38
lettera
0.37
thermodynam
0.37
ivided
0.36
types
0.36
letra
0.36
volatiles
0.36
POSITIVE LOGITS
장애
0.52
gender
0.51
disability
0.48
الجنس
0.47
religion
0.46
gender
0.46
creed
0.44
Disabilities
0.44
Gender
0.43
orientación
0.42
Activations Density 0.003%