INDEX
Explanations
words related to correctness and properness
New Auto-Interp
Negative Logits
-0.20
ÏģÏĮ
-0.17
icap
-0.16
arine
-0.15
usz
-0.15
aries
-0.15
éli
-0.14
ary
-0.14
że
-0.14
оÑĩек
-0.14
POSITIVE LOGITS
fully
0.20
latter
0.16
erken
0.15
Proper
0.15
proper
0.15
ìĿ´ê³ł
0.14
izont
0.14
edList
0.14
ately
0.14
dess
0.14
Activations Density 0.030%