INDEX
Explanations
phrases related to protection and defense
New Auto-Interp
Negative Logits
miah
-0.79
ftime
-0.78
raq
-0.76
meal
-0.74
chin
-0.72
ynes
-0.72
ppa
-0.71
iken
-0.69
unny
-0.68
Clar
-0.68
POSITIVE LOGITS
harm
1.18
impending
1.11
dangers
1.06
future
1.04
predators
1.03
undue
1.01
repr
1.01
exploitation
1.00
imminent
0.98
lawsuits
0.98
Activations Density 0.113%