INDEX
Explanations
proper nouns
the recurring mention of specific names in the text
New Auto-Interp
Negative Logits
iances
-0.78
iance
-0.77
Scotia
-0.71
inity
-0.67
Reference
-0.64
bably
-0.63
é¾
-0.62
dawn
-0.62
antine
-0.61
ACT
-0.60
POSITIVE LOGITS
hu
0.97
hin
0.92
hiro
0.91
hee
0.88
sein
0.88
ryu
0.88
es
0.87
iewicz
0.86
burgh
0.84
esh
0.84
Activations Density 0.045%