INDEX
Explanations
the word "da" in various contexts
New Auto-Interp
Negative Logits
itſelf
-1.05
Diſ
-0.98
themſelves
-0.94
Reſ
-0.93
leaſt
-0.87
Anſ
-0.84
ſeveral
-0.82
myſelf
-0.80
raiſ
-0.80
poffible
-0.79
POSITIVE LOGITS
da
2.21
Da
2.10
Da
2.00
da
1.53
DA
1.51
DA
1.37
Dahl
1.15
да
1.02
DAZ
0.99
Да
0.91
Activations Density 0.075%