INDEX
Explanations
references to military installations or forts
New Auto-Interp
Negative Logits
Himo
-0.97
myſelf
-0.88
Efq
-0.84
pleaſure
-0.81
themſelves
-0.79
resourceCulture
-0.78
greateſt
-0.78
Theſe
-0.78
Majefty
-0.75
SBATCH
-0.74
POSITIVE LOGITS
Fort
0.66
FORT
0.54
Worth
0.53
worth
0.52
Z
0.52
l
0.52
Fort
0.51
Worth
0.50
St
0.49
worth
0.47
Activations Density 0.070%