INDEX
Negative Logits
patience
-0.08
allo
-0.08
habits
-0.08
kön
-0.08
qua
-0.08
allon
-0.08
synonym
-0.08
씀
-0.08
coqu
-0.07
Congress
-0.07
POSITIVE LOGITS
Avoid
0.10
prevented
0.10
securely
0.10
Saf
0.10
સુર
0.10
sanitized
0.10
Saf
0.10
ಸುರ
0.10
নিরাপ
0.10
keamanan
0.10
Activations Density 0.003%