INDEX
Explanations
references to milk and dairy products
New Auto-Interp
Negative Logits
ihar
-0.17
iÅŁ
-0.17
ucher
-0.15
reffen
-0.14
sembles
-0.14
libertine
-0.14
Ville
-0.14
Danh
-0.14
anie
-0.14
aign
-0.13
POSITIVE LOGITS
shake
0.44
maid
0.37
maids
0.33
weed
0.32
sh
0.28
shed
0.25
fat
0.25
man
0.23
ier
0.22
repl
0.22
Activations Density 0.010%