INDEX
Explanations
phrases and terms related to harm or injury
New Auto-Interp
Negative Logits
irie
-0.17
BOR
-0.17
heimer
-0.15
dõi
-0.15
CKET
-0.15
born
-0.15
ander
-0.15
ÅĻ
-0.14
isman
-0.13
si
-0.13
POSITIVE LOGITS
ummer
0.16
$MESS
0.15
adol
0.15
Cv
0.15
/weather
0.14
hur
0.14
κε
0.14
/loose
0.14
aceutical
0.14
">//
0.14
Activations Density 0.035%