INDEX
Explanations
phrases containing proper names followed by verbs or actions
references to sources or credits in the text
New Auto-Interp
Negative Logits
rient
-0.70
terday
-0.60
ynchron
-0.60
neum
-0.59
theless
-0.59
soon
-0.58
ima
-0.58
fame
-0.57
estern
-0.57
rency
-0.56
POSITIVE LOGITS
:
0.85
Provided
0.83
Cards
0.75
Card
0.71
CARD
0.69
Card
0.68
Sources
0.65
giving
0.64
:'
0.63
Wass
0.63
Activations Density 0.020%