INDEX
Explanations
phrases related to problem-solving or figuring things out
New Auto-Interp
Negative Logits
/about
-0.16
upa
-0.15
Sebastian
-0.15
uali
-0.15
aeda
-0.14
elsey
-0.14
gow
-0.14
subsequ
-0.14
/ne
-0.14
OW
-0.13
POSITIVE LOGITS
figured
0.38
figure
0.37
fig
0.33
figure
0.31
fig
0.30
-figure
0.28
figures
0.28
FIG
0.27
FIG
0.27
figures
0.26
Activations Density 0.017%