INDEX
Explanations
phrases related to the potential for positive or effective outcomes when guidelines or support are in place
New Auto-Interp
Negative Logits
safest
-0.19
uros
-0.17
jen
-0.14
безопаÑģ
-0.13
ACY
-0.13
safer
-0.13
arest
-0.13
ãĥ³ãĤ°ãĥ«
-0.13
Cove
-0.13
wig
-0.13
POSITIVE LOGITS
properly
0.42
proper
0.40
proper
0.39
Proper
0.37
correctly
0.33
пÑĢавилÑĮно
0.29
richtig
0.28
æŃ£ç¡®
0.28
Äijúng
0.25
correct
0.23
Activations Density 0.391%