INDEX
Explanations
inquiries about opinions or feedback
New Auto-Interp
Negative Logits
ightly
-0.15
weit
-0.15
oz
-0.14
á»ķ
-0.13
avan
-0.13
otherwise
-0.13
TOTYPE
-0.13
èı²å¾ĭ宾
-0.13
lobs
-0.13
ict
-0.13
POSITIVE LOGITS
think
0.56
thinks
0.50
Think
0.49
thoughts
0.47
Think
0.47
think
0.44
thought
0.42
thinking
0.42
THINK
0.41
Thoughts
0.40
Activations Density 0.043%