INDEX
Explanations
phrases related to observation or noticing
instances of "notice" and related expressions signaling observation or acknowledgment
New Auto-Interp
Negative Logits
uries
-0.78
ribes
-0.75
ieve
-0.70
certific
-0.68
icted
-0.66
aneers
-0.63
faith
-0.62
etime
-0.62
ribe
-0.62
ompl
-0.62
POSITIVE LOGITS
similarity
1.06
similarities
1.02
resemblance
0.99
inconsistency
0.88
pecul
0.84
discrepancies
0.82
subtle
0.82
discrepancy
0.81
difference
0.81
inconsistencies
0.80
Activations Density 0.260%