INDEX
Explanations
references to animals, specifically pets and other related items
New Auto-Interp
Negative Logits
success
-0.15
-0.15
force
-0.14
emap
-0.14
yz
-0.14
orage
-0.14
itters
-0.13
ili
-0.13
minimum
-0.13
raq
-0.13
POSITIVE LOGITS
-shaped
0.28
-themed
0.27
motif
0.24
-inspired
0.21
themed
0.21
shape
0.20
-theme
0.19
éĢł
0.19
theme
0.18
shape
0.18
Activations Density 0.212%