INDEX
Explanations
text fragments indicating contrast or comparison within narratives
New Auto-Interp
Negative Logits
bes
-0.16
anno
-0.15
ally
-0.15
ilst
-0.14
dolayı
-0.14
kara
-0.14
reno
-0.13
nnen
-0.13
ln
-0.13
ilo
-0.13
POSITIVE LOGITS
lington
0.16
eck
0.15
gos
0.15
Rosenstein
0.14
avec
0.14
ãĥ³ãĤº
0.14
emble
0.14
Meadows
0.14
quette
0.14
cul
0.14
Activations Density 0.227%