INDEX
Explanations
instances where actions are being taken
instances of the article "a" in various contexts
New Auto-Interp
Negative Logits
horizont
-0.69
times
-0.66
Autom
-0.65
Attach
-0.65
attributes
-0.65
greets
-0.64
CI
-0.63
eyed
-0.63
tions
-0.62
peed
-0.62
POSITIVE LOGITS
mistake
0.92
contribution
0.88
fuss
0.88
difference
0.85
distinction
0.85
comeback
0.85
lot
0.84
habit
0.83
cameo
0.81
pact
0.81
Activations Density 0.079%