INDEX
Explanations
phrases that indicate challenges or obstacles
New Auto-Interp
Negative Logits
pearance
-0.15
strap
-0.15
zc
-0.15
macen
-0.14
akan
-0.13
luet
-0.13
/Branch
-0.13
iaux
-0.13
fuck
-0.13
foon
-0.13
POSITIVE LOGITS
ly
0.31
/im
0.25
task
0.23
-to
0.22
/exp
0.22
terrain
0.21
khÄĥn
0.21
icult
0.20
tasks
0.20
ies
0.19
Activations Density 0.053%