INDEX
Explanations
references to actions and activities carried out by individuals
New Auto-Interp
Negative Logits
—-
-0.65
ensing
-0.65
etting
-0.64
prosecuting
-0.64
ocus
-0.63
fray
-0.62
AU
-0.61
Voters
-0.59
iaz
-0.58
oyer
-0.58
POSITIVE LOGITS
been
1.31
undergone
1.18
been
1.12
gotten
1.03
access
0.98
gone
0.95
Been
0.93
eaten
0.91
kell
0.89
recourse
0.89
Activations Density 0.352%