INDEX
Explanations
clauses indicating exceptions or contrasts
New Auto-Interp
Negative Logits
Sn
-0.17
sn
-0.16
omen
-0.14
aca
-0.14
138
-0.14
IN
-0.14
over
-0.14
Sn
-0.14
sounding
-0.14
Fra
-0.14
POSITIVE LOGITS
outu
0.17
slightly
0.15
orsk
0.15
Oak
0.15
Orm
0.15
ãģĵãģ¡ãĤī
0.15
iв
0.15
maal
0.14
additionally
0.14
WISE
0.14
Activations Density 0.151%