INDEX
Explanations
references to lions and related terms within various contexts
New Auto-Interp
Negative Logits
undry
-0.21
alian
-0.16
anson
-0.15
ipv
-0.15
rope
-0.15
urent
-0.15
alon
-0.14
Ñĥла
-0.14
neck
-0.14
ustin
-0.14
POSITIVE LOGITS
ess
0.32
esses
0.32
cub
0.24
ardo
0.23
lion
0.23
Lion
0.23
el
0.22
ESS
0.21
mane
0.20
elles
0.20
Activations Density 0.011%