INDEX
Explanations
time-related expressions and indications
phrases indicating familiarity or prior knowledge about a topic
New Auto-Interp
Negative Logits
Dialogue
-0.63
prolong
-0.63
sacrific
-0.58
Hispan
-0.58
Spectre
-0.58
grades
-0.57
INGS
-0.56
downgrade
-0.56
Akin
-0.55
inflic
-0.54
POSITIVE LOGITS
know
1.15
noticed
1.13
know
1.11
guessed
1.10
probably
1.09
heard
1.03
familiar
1.01
already
0.98
undoubtedly
0.96
knew
0.95
Activations Density 0.193%