INDEX
Explanations
the word "knew."
instances of the word "knew," indicating awareness or prior knowledge
New Auto-Interp
Negative Logits
otion
-0.74
phrine
-0.74
oples
-0.68
ILCS
-0.68
ré
-0.68
conservancy
-0.66
olid
-0.66
interstitial
-0.65
ember
-0.65
area
-0.64
POSITIVE LOGITS
beforehand
0.98
ledged
0.86
lessly
0.78
lege
0.77
ledge
0.77
footed
0.72
saf
0.72
nothing
0.71
cut
0.70
instinctively
0.70
Activations Density 0.047%