INDEX
Explanations
questions starting with "how"
New Auto-Interp
Negative Logits
uely
-0.17
飯
-0.16
ermen
-0.15
ören
-0.15
uate
-0.15
cente
-0.15
adele
-0.15
uman
-0.15
uisse
-0.15
uent
-0.15
POSITIVE LOGITS
soever
0.28
itz
0.26
itzer
0.24
beit
0.24
arth
0.20
ards
0.16
ARD
0.15
麼
0.15
egg
0.15
æł·
0.15
Activations Density 0.102%