INDEX
Explanations
discussions around social and political actions and their implications
New Auto-Interp
Negative Logits
apolis
-0.17
ilig
-0.16
done
-0.15
beg
-0.14
exert
-0.14
undertaken
-0.14
isko
-0.14
dup
-0.14
irit
-0.13
qli
-0.13
POSITIVE LOGITS
worthy
0.16
onto
0.15
необÑħодим
0.15
worthy
0.15
Worth
0.14
çĵľ
0.14
oad
0.14
аÑĢа
0.14
equal
0.13
.sleep
0.13
Activations Density 0.629%