INDEX
Explanations
statements about violation of rules or regulations
Prepositions and auxiliary verbs
New Auto-Interp
Negative Logits
ValueStyle
-0.60
comprised
-0.58
이버
-0.56
-
-0.56
Pyx
-0.56
tih
-0.56
Nesta
-0.55
―
-0.54
précis
-0.54
:
-0.54
POSITIVE LOGITS
Efq
0.76
^(@)
0.76
Inscrivez
0.73
honom
0.70
theſe
0.70
itſelf
0.69
Theſe
0.69
Diſ
0.69
myſelf
0.68
الحره
0.68
Activations Density 2.521%