INDEX
Explanations
phrases identifying specific individuals
references to individuals specifically using the word "himself," "herself," or "themselves."
New Auto-Interp
Negative Logits
onal
-0.83
olid
-0.71
ysical
-0.70
CLOSE
-0.69
SHIP
-0.66
Frenzy
-0.63
convergence
-0.63
ammy
-0.62
RELEASE
-0.61
odies
-0.60
POSITIVE LOGITS
admitted
0.94
confessed
0.88
admits
0.86
acknowledged
0.84
é¾įåĸļ士
0.81
conceded
0.77
penned
0.77
profess
0.75
doubted
0.74
contradicted
0.74
Activations Density 0.039%