INDEX
Explanations
references to authors or contributors in academic texts
New Auto-Interp
Negative Logits
myſelf
-0.99
tranſ
-0.98
Anſ
-0.98
ſever
-0.98
pleaſure
-0.96
ſeveral
-0.96
faſt
-0.96
houſe
-0.94
reaſon
-0.94
purpoſe
-0.94
POSITIVE LOGITS
et
1.62
Et
1.08
ET
0.98
للمعارف
0.95
Et
0.93
et
0.92
cetera
0.81
appelez
0.80
AttributeSet
0.78
puesta
0.75
Activations Density 0.047%