INDEX
Explanations
references to animal features and characteristics, particularly wings and tails
New Auto-Interp
Negative Logits
obao
-0.18
668
-0.16
imers
-0.16
-0.16
berapa
-0.16
ichick
-0.16
бав
-0.15
iseum
-0.15
ogue
-0.15
svp
-0.15
POSITIVE LOGITS
less
0.21
n
0.16
grace
0.15
TI
0.15
-equipped
0.15
ate
0.14
ello
0.14
ia
0.14
lessness
0.14
lou
0.13
Activations Density 0.053%