INDEX
Explanations
instances of the word "more"
New Auto-Interp
Negative Logits
oved
-0.08
Ped
-0.06
omba
-0.06
agma
-0.06
rud
-0.06
eda
-0.06
oa
-0.06
,
-0.06
tr
-0.05
elin
-0.05
POSITIVE LOGITS
burgh
0.08
poil
0.08
pok
0.07
_Tis
0.07
åĭ
0.07
cazzo
0.07
_DECLS
0.07
assel
0.07
Äįel
0.07
šli
0.07
Activations Density 0.001%