INDEX
Explanations
phrases related to rules and regulations in various contexts
New Auto-Interp
Negative Logits
ãĥ«ãĥĪ
-0.16
_mD
-0.15
edException
-0.15
_tE
-0.15
acades
-0.14
apore
-0.14
alore
-0.14
плиÑĤ
-0.14
%B
-0.14
иÑģлов
-0.14
POSITIVE LOGITS
umber
0.16
other
0.16
similarly
0.15
likewise
0.15
naopak
0.14
bie
0.14
¤
0.14
imilar
0.14
others
0.14
dit
0.14
Activations Density 0.336%