INDEX
Explanations
inquiries and opinions from the audience
New Auto-Interp
Negative Logits
weetalert
-0.15
amburger
-0.15
ä¹ĥ
-0.15
ightly
-0.14
weit
-0.14
åħ·
-0.14
ingo
-0.14
à¥įदर
-0.14
PushMatrix
-0.13
ι
-0.13
POSITIVE LOGITS
think
0.52
thinks
0.48
Think
0.47
Think
0.45
thoughts
0.43
think
0.41
thinking
0.40
THINK
0.40
thought
0.39
Thoughts
0.38
Activations Density 0.051%