INDEX
Explanations
phrases indicating capabilities and functional attributes
New Auto-Interp
Negative Logits
uch
-0.19
egie
-0.18
ãĥŃãĥ¼
-0.17
ucch
-0.17
koc
-0.16
bjerg
-0.16
uche
-0.15
utzer
-0.15
ollo
-0.15
insky
-0.15
POSITIVE LOGITS
569
0.15
phans
0.14
Surv
0.14
cob
0.13
ÅĤad
0.13
iao
0.13
è¾¼
0.13
erto
0.13
çĸ²
0.12
patch
0.12
Activations Density 0.045%