INDEX
Explanations
phrases related to providing insight or a brief preview of future events
references to glimpses or previews of information or concepts
New Auto-Interp
Negative Logits
hement
-0.75
ients
-0.71
KK
-0.70
cial
-0.69
ubs
-0.67
die
-0.66
lees
-0.66
arently
-0.64
eches
-0.64
depended
-0.63
POSITIVE LOGITS
whats
0.91
sorts
0.83
helm
0.77
theirs
0.73
ours
0.73
doom
0.71
what
0.69
physiology
0.69
reality
0.68
how
0.66
Activations Density 0.187%