INDEX
Explanations
inquiries about self-reflection and personal opinions
New Auto-Interp
Negative Logits
inton
-0.16
Å¡tÄĽ
-0.15
499
-0.15
":"'
-0.15
iliz
-0.14
Tire
-0.14
ÑĥÑĢи
-0.14
allel
-0.14
adesh
-0.13
BACKGROUND
-0.13
POSITIVE LOGITS
think
1.02
Think
0.94
thinking
0.91
think
0.91
THINK
0.89
thinks
0.89
Think
0.88
thoughts
0.82
thought
0.79
thinking
0.76
Activations Density 0.449%