INDEX
Explanations
phrases expressing knowledge or awareness
phrases indicating familiarity or prior knowledge
New Auto-Interp
Negative Logits
oreal
-0.73
xa
-0.71
rontal
-0.71
vati
-0.68
oyer
-0.67
orthy
-0.66
ongevity
-0.65
cific
-0.64
iterranean
-0.63
foreseen
-0.63
POSITIVE LOGITS
already
0.73
by
0.68
yourselves
0.68
me
0.67
BY
0.66
76561
0.60
tale
0.60
guessed
0.58
about
0.56
that
0.56
Activations Density 0.128%