INDEX
Explanations
phrases that express hypothetical situations or conditional thoughts
New Auto-Interp
Negative Logits
plor
-0.17
insky
-0.15
oka
-0.15
-heading
-0.14
oplay
-0.14
ç´
-0.14
令
-0.14
ãĥ¼ãĥIJ
-0.14
XR
-0.13
agh
-0.13
POSITIVE LOGITS
think
0.29
THINK
0.28
thinks
0.26
Think
0.26
thinking
0.26
thought
0.25
Think
0.25
think
0.24
expects
0.24
expect
0.23
Activations Density 0.032%