INDEX
Explanations
mentions or references to autographs
references to autographs
New Auto-Interp
Negative Logits
lihood
-0.80
imaru
-0.79
IRO
-0.77
BLE
-0.73
Soda
-0.71
UTION
-0.70
BE
-0.69
ORK
-0.69
Passage
-0.68
edin
-0.67
POSITIVE LOGITS
ographs
1.26
ograph
1.23
opsy
1.11
ographed
1.08
umn
1.07
ogyn
1.01
ocom
1.00
iques
0.99
oly
0.98
istically
0.98
Activations Density 0.016%