INDEX
Explanations
phrases related to consideration or attention directed towards different topics or subjects
phrases that involve the expression of perspectives or considerations on various topics
New Auto-Interp
Negative Logits
avorite
-0.71
submar
-0.66
egg
-0.65
preached
-0.62
tatt
-0.60
Straw
-0.60
Panc
-0.59
Clojure
-0.59
Guth
-0.58
mith
-0.58
POSITIVE LOGITS
rection
0.85
ments
0.81
perature
0.80
ibility
0.79
rients
0.77
rait
0.76
equality
0.76
rity
0.76
orea
0.75
tains
0.75
Activations Density 0.027%