INDEX
Explanations
references to power dynamics and societal structures
that follow a verb
future work also
New Auto-Interp
Negative Logits
httphttps
-0.65
ftagPool
-0.49
锈钢
-0.46
TagMode
-0.46
kaarangay
-0.44
tonode
-0.44
İstinadlar
-0.42
رشف
-0.40
שוליים
-0.39
cours
-0.39
POSITIVE LOGITS
also
0.60
dessutom
0.51
inoltre
0.50
grunns
0.48
også
0.47
heller
0.47
außerdem
0.46
also
0.46
також
0.45
myös
0.45
Activations Density 0.924%