INDEX
Explanations
themes related to power dynamics and control in various contexts
New Auto-Interp
Negative Logits
icz
-0.15
arios
-0.14
vou
-0.14
aeper
-0.14
shi
-0.13
άλι
-0.13
EXPORT
-0.13
uds
-0.13
langs
-0.13
848
-0.12
POSITIVE LOGITS
away
1.77
Away
1.54
away
1.40
Away
1.35
-away
1.27
aways
0.74
weg
0.72
AW
0.59
.aw
0.53
awy
0.47
Activations Density 0.449%