INDEX
Explanations
instances where something happens repeatedly or sequentially
phrases indicating repetition or emphasizing singularity
New Auto-Interp
Negative Logits
Lazarus
-0.66
aman
-0.65
mun
-0.62
LAB
-0.60
Mirage
-0.59
scroll
-0.58
eret
-0.58
orter
-0.57
uty
-0.57
wm
-0.56
POSITIVE LOGITS
anymore
0.98
nor
0.91
mention
0.77
slightest
0.76
anywhere
0.75
necessarily
0.73
describ
0.69
anything
0.68
anybody
0.64
coincidence
0.63
Activations Density 0.312%