INDEX
Explanations
phrases with definite articles
New Auto-Interp
Negative Logits
itſelf
-1.01
raiſ
-0.94
myſelf
-0.92
preſent
-0.90
Plenum
-0.88
chré
-0.86
themſelves
-0.86
pleaſure
-0.86
religieuses
-0.84
Efq
-0.84
POSITIVE LOGITS
the
1.19
de
1.02
The
0.95
De
0.83
The
0.76
Οι
0.75
den
0.75
0.74
Z
0.73
the
0.73
Activations Density 0.022%