INDEX
Explanations
phrases indicating knowledge or understanding
instances of the word "knows."
New Auto-Interp
Negative Logits
phrine
-0.94
oples
-0.77
isco
-0.69
unal
-0.68
OPLE
-0.68
nesota
-0.68
anmar
-0.67
uld
-0.67
interstitial
-0.67
adies
-0.66
POSITIVE LOGITS
ledged
1.04
ledge
0.84
terday
0.78
whats
0.77
how
0.72
lege
0.71
exactly
0.71
afer
0.71
how
0.70
LED
0.67
Activations Density 0.043%