INDEX
Explanations
words related to human or humanoid figures or representations
references to significant individuals or roles
New Auto-Interp
Negative Logits
oise
-0.71
umenthal
-0.70
esis
-0.69
Policy
-0.68
velength
-0.64
izabeth
-0.63
iott
-0.63
itcher
-0.63
Ples
-0.62
é¾
-0.61
POSITIVE LOGITS
head
1.21
heads
1.20
skating
1.12
prominently
1.03
downs
0.83
doms
0.80
books
0.76
enance
0.76
awa
0.76
hig
0.75
Activations Density 0.030%