INDEX
Explanations
phrases that describe actions or states occurring simultaneously
New Auto-Interp
Negative Logits
bourg
-0.19
oggles
-0.15
ysi
-0.15
ollower
-0.15
ulis
-0.14
heimer
-0.14
alette
-0.14
ınızda
-0.14
cec
-0.14
evice
-0.14
POSITIVE LOGITS
eder
0.15
aph
0.15
they
0.15
bows
0.15
doors
0.14
trains
0.14
cupid
0.14
pret
0.14
ews
0.14
ph
0.14
Activations Density 0.081%