INDEX
Explanations
words ending with "owers"
phrases related to power dynamics
references to power dynamics and authority figures
New Auto-Interp
Negative Logits
ERAL
-0.72
âĸ¬
-0.69
Philipp
-0.69
ר
-0.65
׾
-0.64
ric
-0.64
cs
-0.64
Condition
-0.64
Pacific
-0.63
à©
-0.63
POSITIVE LOGITS
chwitz
0.96
hops
0.96
peed
0.93
ynthesis
0.92
kinson
0.91
pace
0.89
ktop
0.88
hift
0.88
uits
0.87
creen
0.83
Activations Density 0.008%