INDEX
Explanations
words related to exploration and discovery
New Auto-Interp
Negative Logits
uracy
-0.16
ména
-0.15
ilia
-0.15
Unidos
-0.15
η
-0.15
ural
-0.15
igi
-0.15
hee
-0.14
ched
-0.14
endar
-0.14
POSITIVE LOGITS
ways
0.16
horn
0.15
lust
0.15
ments
0.15
es
0.14
948
0.14
غة
0.14
ives
0.14
ry
0.14
67
0.14
Activations Density 0.031%