INDEX
Explanations
phrases that indicate permission or enablement for actions or processes
New Auto-Interp
Negative Logits
utherland
-0.15
RIPT
-0.14
cury
-0.14
uf
-0.14
帮
-0.14
θεÏģ
-0.14
UF
-0.14
ventus
-0.13
ulfilled
-0.13
pomoc
-0.13
POSITIVE LOGITS
us
0.31
you
0.20
easy
0.17
ffects
0.16
us
0.16
him
0.16
swer
0.16
easily
0.15
Ñģобой
0.15
them
0.14
Activations Density 0.057%