INDEX
Explanations
references to specific animal species and their characteristics
New Auto-Interp
Negative Logits
fin
-0.16
ppv
-0.16
rada
-0.16
radu
-0.14
aylight
-0.14
mpar
-0.14
ÙħاÙĦ
-0.14
_Parse
-0.14
proh
-0.14
Oak
-0.14
POSITIVE LOGITS
jab
0.19
lag
0.18
зай
0.18
Bison
0.17
ser
0.17
loth
0.17
pt
0.17
ele
0.17
ser
0.16
pec
0.16
Activations Density 0.024%