INDEX
Explanations
phrases expressing knowledge or awareness
instances of the word "knew" and variations of it, indicating familiarity or prior knowledge
New Auto-Interp
Negative Logits
otion
-0.77
pex
-0.72
phrine
-0.71
area
-0.70
olid
-0.70
ré
-0.67
ulum
-0.67
Yugoslavia
-0.64
por
-0.64
ILCS
-0.63
POSITIVE LOGITS
beforehand
0.97
ledged
0.83
lessly
0.80
bryce
0.73
nothing
0.72
lege
0.72
saf
0.70
instinctively
0.69
($)
0.69
footed
0.67
Activations Density 0.032%