INDEX
Explanations
mentions or observations of specific details in text
instances of the word "note" and its interactions with various subjects in the document
New Auto-Interp
Negative Logits
UME
-0.73
certific
-0.72
agn
-0.67
atown
-0.66
advertising
-0.66
ribe
-0.65
ribes
-0.64
icted
-0.64
asio
-0.63
adolesc
-0.63
POSITIVE LOGITS
similarities
1.09
similarity
0.96
discrepancies
0.93
signs
0.88
inconsistencies
0.87
resemblance
0.79
inconsistency
0.75
differences
0.74
deviations
0.72
anomalies
0.71
Activations Density 0.221%