INDEX
Explanations
references to folk art and culture
New Auto-Interp
Negative Logits
yx
-0.18
chyb
-0.17
ãĤº
-0.17
eel
-0.16
cpy
-0.16
ghest
-0.16
mia
-0.15
roperties
-0.15
quia
-0.15
psilon
-0.15
POSITIVE LOGITS
lor
0.44
lore
0.35
ways
0.27
ta
0.26
лоÑĢ
0.26
oric
0.24
lo
0.23
ore
0.23
tale
0.20
WAYS
0.20
Activations Density 0.008%