INDEX
Explanations
mentions of defensive actions or strategies
phrases related to confrontation or combat scenarios
New Auto-Interp
Negative Logits
urches
-0.77
encount
-0.67
IVERS
-0.67
pioneers
-0.65
selves
-0.65
millenn
-0.65
proponents
-0.61
saints
-0.61
promoters
-0.61
arij
-0.60
POSITIVE LOGITS
theirs
0.96
himself
0.90
hers
0.88
herself
0.71
whom
0.68
Himself
0.65
heit
0.65
his
0.64
him
0.62
stretched
0.60
Activations Density 0.885%