INDEX
Explanations
phrases that express personal experiences or memories
New Auto-Interp
Negative Logits
roduced
-0.15
linger
-0.15
à¹Ģ
-0.14
alian
-0.14
ãĥªãĤ«
-0.14
given
-0.14
cabo
-0.14
sent
-0.13
ormal
-0.13
breeze
-0.13
POSITIVE LOGITS
saw
0.35
experienced
0.34
heard
0.33
Saw
0.30
witnessed
0.29
observed
0.29
heard
0.28
Heard
0.26
caught
0.25
seen
0.25
Activations Density 0.202%