INDEX
Explanations
mentions of significant health issues or concerns
New Auto-Interp
Negative Logits
rg
-0.18
903
-0.17
izar
-0.15
ør
-0.15
Kore
-0.15
e
-0.15
spin
-0.14
mk
-0.14
managed
-0.14
Invalid
-0.14
POSITIVE LOGITS
edla
0.18
irut
0.16
ư
0.15
ÄĻż
0.15
ribbon
0.15
ewed
0.15
ÑĤик
0.15
%@
0.15
ñana
0.14
íĺ¼
0.14
Activations Density 0.086%