INDEX
Explanations
actions or activities performed by individuals
past tense verbs and their associated actions
New Auto-Interp
Negative Logits
been
-0.74
cedented
-0.70
league
-0.68
arta
-0.63
veland
-0.62
Alley
-0.60
ibel
-0.60
enta
-0.59
eda
-0.59
thur
-0.57
POSITIVE LOGITS
tremend
0.82
nesday
0.68
DEV
0.63
unsuccessfully
0.62
themselves
0.59
JV
0.57
him
0.55
brilliantly
0.54
unnecess
0.54
ãĤ¤ãĥĪ
0.54
Activations Density 0.653%