INDEX
Explanations
verbs related to defense or protection
terms related to defense and neglect
New Auto-Interp
Negative Logits
Jarrett
-0.81
anamo
-0.73
itable
-0.70
tyr
-0.67
stakes
-0.65
bonds
-0.65
raltar
-0.64
Madison
-0.61
teammate
-0.61
Survivors
-0.61
POSITIVE LOGITS
IJ
0.92
Ķ
0.88
rodu
0.87
fend
0.82
ij
0.80
iard
0.80
ual
0.79
)=(
0.79
eenth
0.77
abouts
0.76
Activations Density 0.024%