INDEX
Explanations
phrases indicating purpose or intention related to an action
New Auto-Interp
Negative Logits
ovna
-0.17
anlık
-0.15
cia
-0.15
dÃ¼ÅŁÃ¼r
-0.15
ãĤ¤ãĥĦ
-0.14
æľĹ
-0.14
ei
-0.14
.struts
-0.14
anford
-0.14
adan
-0.14
POSITIVE LOGITS
justice
0.32
differently
0.28
Justice
0.27
justice
0.27
wrong
0.26
Justice
0.24
Wrong
0.20
backwards
0.19
wrong
0.18
WRONG
0.18
Activations Density 0.040%