INDEX
Explanations
instances of words related to promotions or the utilization of resources or information for specific purposes
themes related to power dynamics and social issues
New Auto-Interp
Negative Logits
essing
-0.58
"],"
-0.57
OSED
-0.57
osion
-0.56
icable
-0.55
layer
-0.55
ategory
-0.54
retty
-0.54
én
-0.52
sonian
-0.52
POSITIVE LOGITS
sparing
1.17
wisely
1.10
to
1.02
interchange
0.99
extensively
0.97
pseudonym
0.96
as
0.83
instead
0.83
metaphor
0.81
inappropriately
0.81
Activations Density 0.363%