INDEX
Explanations
phrases related to taking action or being taken
instances of the word "taken" in various contexts
New Auto-Interp
Negative Logits
eers
-0.71
ichick
-0.68
glers
-0.67
lich
-0.63
tions
-0.60
gian
-0.60
vine
-0.56
itte
-0.55
icing
-0.55
Trees
-0.55
POSITIVE LOGITS
aback
1.59
aways
1.12
advantage
1.06
care
0.97
seriously
0.89
hostage
0.88
orally
0.82
oqu
0.78
Seriously
0.77
captive
0.77
Activations Density 0.034%