INDEX
Explanations
phrases related to military units or specific commands
references to military commands or organizations
New Auto-Interp
Negative Logits
OHN
-0.69
Niet
-0.69
Democr
-0.65
soluble
-0.64
Schne
-0.60
Anita
-0.60
aunder
-0.58
Rhod
-0.58
Shapiro
-0.57
heterogeneity
-0.57
POSITIVE LOGITS
eering
1.13
Command
1.06
eers
0.99
eer
0.96
Commands
0.90
Command
0.89
eur
0.87
force
0.85
hammer
0.84
quartered
0.83
Activations Density 0.008%