INDEX
Explanations
terms related to power structures and their effects on society
New Auto-Interp
Negative Logits
tsy
-0.17
odash
-0.17
utsch
-0.17
tls
-0.17
tember
-0.17
(s
-0.17
tep
-0.17
tridge
-0.16
placer
-0.16
togroup
-0.16
POSITIVE LOGITS
es
1.40
(es
0.76
esin
0.61
ES
0.59
s
0.59
eses
0.57
ses
0.53
esModule
0.49
'es
0.49
ness
0.48
Activations Density 0.299%