INDEX
Explanations
phrases indicating results, benefits, or actions to be taken
New Auto-Interp
Negative Logits
ULK
-0.15
اÛĮاÙĨ
-0.15
outes
-0.14
xAB
-0.14
shadow
-0.14
ạ
-0.14
ær
-0.14
oker
-0.14
shadows
-0.14
wart
-0.14
POSITIVE LOGITS
324
0.17
olet
0.16
fal
0.15
clas
0.15
loh
0.14
apol
0.14
ugi
0.14
159
0.14
.blob
0.14
ISCO
0.14
Activations Density 0.087%