INDEX
Explanations
syntax elements like code comments and dots before newlines
ellipses and unfinished sentences or thoughts
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.92
uers
-0.76
Galile
-0.70
ratulations
-0.69
ously
-0.66
aven
-0.65
Stall
-0.64
Krug
-0.64
slope
-0.63
slopes
-0.62
POSITIVE LOGITS
etc
1.04
walking
0.87
please
0.83
where
0.81
fixme
0.81
ordered
0.80
SER
0.77
999
0.77
cum
0.76
ser
0.76
Activations Density 0.011%