INDEX
Explanations
themes related to socio-political criticism and power dynamics
New Auto-Interp
Negative Logits
uple
-0.16
eyen
-0.16
ore
-0.14
ASP
-0.14
Affero
-0.14
dur
-0.14
details
-0.14
odic
-0.14
so
-0.14
ft
-0.14
POSITIVE LOGITS
adÃŃ
0.21
agal
0.19
.protobuf
0.14
plá
0.14
acho
0.14
дейÑģÑĤв
0.14
631
0.14
tolower
0.14
infeld
0.13
ázÃŃ
0.13
Activations Density 0.490%