INDEX
Explanations
the definite article 'the' at the beginning of sentences
New Auto-Interp
Negative Logits
olicy
-0.80
arians
-0.73
APH
-0.71
acca
-0.69
mania
-0.69
itus
-0.69
arcity
-0.68
ivas
-0.67
fn
-0.67
emen
-0.65
POSITIVE LOGITS
midst
1.54
vicinity
1.23
middle
1.16
meantime
1.09
aftermath
1.08
hallway
1.05
doorway
1.04
woods
1.02
wake
1.02
foreground
1.00
Activations Density 0.231%