INDEX
Explanations
instances of control and personal agency in relationships and societal structures
New Auto-Interp
Negative Logits
wolf
-0.17
acio
-0.15
ork
-0.15
ablo
-0.14
Wrath
-0.14
ãĥ¼ãĤ¹ãĥĪ
-0.14
pson
-0.14
438
-0.14
quette
-0.14
abit
-0.14
POSITIVE LOGITS
nor
0.22
aux
0.17
dames
0.16
erva
0.15
nor
0.15
Greenwood
0.15
ÑĢаз
0.15
unless
0.14
icorn
0.14
unless
0.14
Activations Density 0.208%