INDEX
Explanations
phrases indicating personal opinions or reflections
New Auto-Interp
Negative Logits
cona
-0.15
Flush
-0.15
ÏĦÏį
-0.14
_DOM
-0.14
_flush
-0.14
lobals
-0.14
olla
-0.13
Ñijл
-0.13
osyal
-0.13
NECT
-0.13
POSITIVE LOGITS
think
0.41
Think
0.39
thinking
0.39
THINK
0.38
think
0.38
Think
0.36
thinking
0.35
thinks
0.34
-thinking
0.33
thought
0.32
Activations Density 0.009%