INDEX
Explanations
words or phrases that introduce contrast or exceptions
New Auto-Interp
Negative Logits
Cæsar
-1.11
Majefty
-1.08
purpoſe
-1.06
Efq
-1.05
houſe
-1.03
pleaſure
-1.02
Theſe
-1.01
ſtate
-1.00
Monfieur
-0.99
quæ
-0.99
POSITIVE LOGITS
but
1.03
due
0.78
Due
0.62
and
0.61
or
0.60
more
0.58
]),
0.57
More
0.56
to
0.55
as
0.55
Activations Density 0.286%