INDEX
Explanations
instances of dialogue and character interactions in film and literature
New Auto-Interp
Negative Logits
lava
-0.17
atha
-0.17
argas
-0.15
argins
-0.15
ãĥªãĤ¹
-0.15
AMESPACE
-0.15
ousel
-0.14
erosis
-0.14
Streamer
-0.14
sÃŃ
-0.14
POSITIVE LOGITS
dummy
0.16
ysa
0.16
actual
0.15
Safety
0.15
Authentic
0.14
Actual
0.14
ãĤº
0.14
iyan
0.14
Dummy
0.13
Wahl
0.13
Activations Density 0.217%