INDEX
Explanations
references to a specific individual or entity
New Auto-Interp
Negative Logits
ive
-0.17
cole
-0.17
gether
-0.16
umblr
-0.16
heiten
-0.16
_DECREF
-0.16
_IOC
-0.16
олож
-0.15
ermen
-0.15
entine
-0.15
POSITIVE LOGITS
s
0.37
/her
0.35
SELF
0.32
self
0.24
atically
0.23
sing
0.22
Ùĩ
0.22
elf
0.21
-self
0.20
/us
0.20
Activations Density 0.005%