INDEX
Explanations
words related to obedience and disobedience in the context of social or personal dynamics
New Auto-Interp
Negative Logits
Lauder
-0.83
Divinity
-0.79
UAL
-0.78
Painter
-0.77
ILLE
-0.75
âĸ¬
-0.71
Birch
-0.69
UES
-0.68
EngineDebug
-0.68
Martial
-0.68
POSITIVE LOGITS
vious
1.28
lig
1.23
served
1.23
esity
1.21
amacare
1.18
fusc
1.14
serv
1.13
lique
1.12
server
1.11
acter
1.11
Activations Density 0.008%