INDEX
Explanations
descriptive adjectives or phrases indicating evaluation or opinion
instances of the word "it"
New Auto-Interp
Negative Logits
paran
-0.60
evil
-0.59
Dayton
-0.56
Eighth
-0.55
anton
-0.54
heartbeat
-0.52
kie
-0.52
————————
-0.51
former
-0.51
VA
-0.51
POSITIVE LOGITS
beh
1.27
seems
1.26
begs
1.11
appears
1.09
becomes
1.06
seemed
1.04
zik
1.03
chy
1.01
Seems
1.00
unes
0.99
Activations Density 0.193%