INDEX
Explanations
specific German articles and pronouns in sentences
New Auto-Interp
Negative Logits
itſelf
-0.97
houſe
-0.97
pleaſure
-0.96
purpoſe
-0.92
propOrder
-0.91
ſtate
-0.90
fubject
-0.86
raiſ
-0.85
myſelf
-0.84
Majefty
-0.81
POSITIVE LOGITS
der
1.17
Die
0.98
Οι
0.95
Der
0.95
den
0.94
The
0.90
The
0.90
THE
0.85
THE
0.83
die
0.83
Activations Density 0.009%