INDEX
Explanations
references to the word "De"
New Auto-Interp
Negative Logits
intenant
-0.76
Houſe
-0.73
Theſe
-0.71
itſelf
-0.70
Sitten
-0.70
Anſ
-0.70
Efq
-0.69
iprot
-0.69
#+#
-0.68
leaſt
-0.68
POSITIVE LOGITS
De
3.06
De
2.77
Де
1.15
DeV
1.13
Di
1.06
Du
1.04
Da
0.97
Di
0.95
Da
0.94
Des
0.93
Activations Density 0.105%