INDEX
Explanations
instances of actions taken by individuals
instances of the verb "took" in different contexts
New Auto-Interp
Negative Logits
agre
-0.71
earances
-0.67
david
-0.65
eers
-0.64
Cong
-0.63
sidebar
-0.62
Smile
-0.62
ler
-0.62
density
-0.61
[+]
-0.59
POSITIVE LOGITS
aways
1.08
advantage
1.03
aback
0.96
heed
0.91
refuge
0.86
care
0.85
arnaev
0.85
autions
0.84
pains
0.84
precedence
0.82
Activations Density 0.106%