INDEX
Explanations
adverbs that describe manner or frequency of actions
New Auto-Interp
Negative Logits
al
-0.81
p
-0.80
l
-0.77
k
-0.76
b
-0.74
d
-0.73
r
-0.73
es
-0.72
an
-0.72
z
-0.72
POSITIVE LOGITS
sively
1.54
ently
1.50
denly
1.46
ificantly
1.45
ALLY
1.43
xically
1.42
atically
1.40
aneously
1.40
cerely
1.38
tically
1.37
Activations Density 0.661%