INDEX
Explanations
adverbs describing manner or frequency
New Auto-Interp
Negative Logits
al
-0.88
b
-0.72
z
-0.71
p
-0.71
l
-0.71
tic
-0.71
an
-0.71
ma
-0.70
h
-0.69
k
-0.68
POSITIVE LOGITS
ently
1.49
sively
1.47
denly
1.39
cerely
1.37
']")
1.36
ously
1.35
handedly
1.35
ificantly
1.33
xically
1.31
ALLY
1.31
Activations Density 0.777%