INDEX
Explanations
instances of reflection and contemplation about names or concepts
New Auto-Interp
Negative Logits
337
-0.17
oli
-0.16
aliz
-0.16
oly
-0.16
elage
-0.15
GRAPH
-0.15
igans
-0.15
thr
-0.15
thesis
-0.14
hf
-0.14
POSITIVE LOGITS
think
0.53
Think
0.49
think
0.47
Think
0.45
thinking
0.43
thinks
0.42
thought
0.42
THINK
0.37
thinking
0.36
Thinking
0.36
Activations Density 0.044%