INDEX
Explanations
instances where something is being controlled or influenced by an authoritative force
words related to control or authority
New Auto-Interp
Negative Logits
swick
-0.75
este
-0.72
Tok
-0.71
att
-0.69
Hat
-0.69
Dak
-0.69
Charity
-0.69
Mary
-0.67
Fighting
-0.67
Horn
-0.66
POSITIVE LOGITS
dictated
1.48
dictate
1.35
dictates
1.26
confir
1.12
dict
0.93
rul
0.83
governs
0.83
lapt
0.81
corrid
0.81
é¾įå¥ij士
0.79
Activations Density 0.013%