INDEX
Explanations
instances of the word "think" and its variations used to prompt reflection or consideration
New Auto-Interp
Negative Logits
ãĤ¸ãĤ¢
-0.16
ious
-0.16
leton
-0.14
vido
-0.14
ldr
-0.14
hle
-0.14
ÏĨÏħ
-0.13
iano
-0.13
аÑĢÑĩ
-0.13
رÙĪÙħ
-0.13
POSITIVE LOGITS
tors
0.16
ock
0.15
æĭ¥
0.15
amina
0.15
erve
0.14
orthand
0.14
ascar
0.14
LOAT
0.14
lü
0.13
ollo
0.13
Activations Density 0.024%