INDEX
Explanations
mentions of animals and their-related concepts
New Auto-Interp
Negative Logits
mund
-0.16
огод
-0.16
enance
-0.15
atform
-0.15
eday
-0.15
indr
-0.15
atch
-0.15
edy
-0.14
684
-0.14
bies
-0.14
POSITIVE LOGITS
/people
0.21
istic
0.18
arendra
0.15
st
0.15
ause
0.14
kingdom
0.14
-rights
0.14
hud
0.13
.fig
0.13
üstü
0.13
Activations Density 0.047%