INDEX
Explanations
phrases indicating ability or possibility
New Auto-Interp
Negative Logits
Reloaded
-0.65
Lug
-0.61
erville
-0.60
NetMessage
-0.56
abin
-0.56
hess
-0.55
REF
-0.54
atti
-0.54
alloc
-0.53
entin
-0.53
POSITIVE LOGITS
guessed
1.04
attest
0.90
imagine
0.88
guess
0.82
infer
0.81
seen
0.77
doubtless
0.73
see
0.73
noticing
0.72
notice
0.72
Activations Density 0.041%