INDEX
Explanations
phrases indicating a high degree of quality or significance
phrases indicating strong opinions or definitive statements
New Auto-Interp
Negative Logits
artments
-0.68
eral
-0.68
ifferent
-0.65
istries
-0.62
osponsors
-0.61
urches
-0.60
yrics
-0.59
yss
-0.59
ibilities
-0.59
iries
-0.59
POSITIVE LOGITS
underrated
1.03
THE
0.95
the
0.87
one
0.86
arguably
0.84
unmatched
0.83
coolest
0.81
unparalleled
0.80
the
0.79
best
0.78
Activations Density 0.354%