INDEX
Explanations
phrases indicating optimal times for action or consideration
New Auto-Interp
Negative Logits
ancies
-0.74
arser
-0.73
iries
-0.70
uded
-0.68
urat
-0.66
usters
-0.64
leys
-0.63
uthor
-0.62
qus
-0.61
teness
-0.61
POSITIVE LOGITS
opportunity
0.98
opp
0.83
ripe
0.73
introducing
0.72
FORE
0.71
for
0.69
attRot
0.68
sleeper
0.66
NING
0.66
fro
0.66
Activations Density 0.100%