INDEX
Explanations
expressions of surprise or approval
New Auto-Interp
Negative Logits
hd
-0.16
umbled
-0.15
akt
-0.15
ataloader
-0.15
ograd
-0.15
seekers
-0.14
ollapse
-0.14
ilder
-0.14
eed
-0.14
apat
-0.14
POSITIVE LOGITS
oop
0.19
Inline
0.17
anden
0.15
ubi
0.15
ãĥ¼ãĥĵ
0.14
UGIN
0.14
ELY
0.14
Conc
0.14
inea
0.14
IEL
0.14
Activations Density 0.151%