INDEX
Explanations
references to pets and their significance
New Auto-Interp
Negative Logits
arkan
-0.15
MÃľ
-0.14
illard
-0.14
chalk
-0.14
ovel
-0.14
Babylon
-0.14
":["
-0.13
lotte
-0.13
roscope
-0.13
оÑĢон
-0.13
POSITIVE LOGITS
fur
0.28
humans
0.27
fur
0.27
pur
0.25
Humans
0.25
human
0.24
Humans
0.24
pur
0.23
Pur
0.22
paw
0.21
Activations Density 0.001%