INDEX
Explanations
instances where someone is making a specific observation or point
instances of the word "noted."
New Auto-Interp
Negative Logits
quer
-0.83
atom
-0.72
soever
-0.71
ravel
-0.71
icum
-0.68
adesh
-0.68
heed
-0.67
ascus
-0.66
WAY
-0.66
iland
-0.66
POSITIVE LOGITS
inconsistencies
0.86
noting
0.81
discrepancies
0.77
how
0.75
approving
0.74
prominently
0.73
similarities
0.72
unct
0.72
favorably
0.71
notes
0.68
Activations Density 0.042%