INDEX
Explanations
phrases indicating steps or instructions
New Auto-Interp
Negative Logits
avou
-0.17
stor
-0.16
åŁº
-0.15
heimer
-0.15
roe
-0.15
istik
-0.14
ê´Ģ
-0.14
decis
-0.14
ãĤ¤ãĥ³ãĥĪ
-0.14
osas
-0.14
POSITIVE LOGITS
sie
0.17
ei
0.16
Schul
0.15
eat
0.15
.sell
0.15
fly
0.14
fly
0.14
sell
0.14
Schultz
0.14
Appro
0.14
Activations Density 0.158%