INDEX
Explanations
expressions related to hierarchical authority and command structures
New Auto-Interp
Negative Logits
oÄŁ
-0.17
wargs
-0.15
erties
-0.15
alley
-0.15
ela
-0.15
borg
-0.14
bourg
-0.14
ulton
-0.14
udeau
-0.14
elves
-0.14
POSITIVE LOGITS
request
0.48
instructions
0.45
requests
0.45
orders
0.44
instruction
0.42
directions
0.41
commands
0.40
request
0.37
command
0.37
requested
0.37
Activations Density 0.237%