INDEX
Explanations
phrases that indicate quantity or amount
New Auto-Interp
Negative Logits
pleaſure
-0.85
Efq
-0.79
―――――
-0.74
itſelf
-0.72
Cæsar
-0.70
Houſe
-0.70
fhort
-0.70
raiſ
-0.70
NDEBUG
-0.70
becauſe
-0.69
POSITIVE LOGITS
of
1.14
Of
0.85
OF
0.83
ReusableCell
0.82
المعيارى
0.81
ompok
0.80
OfClass
0.80
Of
0.80
của
0.76
unked
0.75
Activations Density 0.147%