INDEX
Explanations
indicators of social and moral debates, particularly regarding sexuality and governance
New Auto-Interp
Negative Logits
MOOTH
-0.16
едж
-0.16
Mattis
-0.15
pupper
-0.15
readcr
-0.15
IMIT
-0.14
-scrollbar
-0.14
661
-0.13
èķī
-0.13
fucks
-0.13
POSITIVE LOGITS
same
0.31
same
0.23
sod
0.22
evolution
0.22
poly
0.21
capital
0.21
ped
0.21
Capital
0.20
Same
0.20
Same
0.20
Activations Density 0.305%