INDEX
Explanations
words related to superlatives or extremes
the word "most" used to signify the majority in a context
New Auto-Interp
Negative Logits
vest
-0.72
pload
-0.68
icer
-0.67
Mellon
-0.67
rompt
-0.66
instead
-0.65
abad
-0.65
pton
-0.61
nton
-0.61
thur
-0.60
POSITIVE LOGITS
importantly
1.25
afa
0.90
body
0.90
notably
0.89
tenance
0.81
likely
0.80
rar
0.79
observers
0.77
important
0.76
likely
0.74
Activations Density 0.053%