INDEX
Explanations
New Auto-Interp
Negative Logits
at
-3.20
tại
-1.27
at
-1.13
in
-1.02
At
-0.93
bei
-0.84
At
-0.80
ở
-0.77
as
-0.75
a
-0.74
POSITIVE LOGITS
Monfieur
1.23
Theſe
1.16
Efq
1.05
myſelf
1.01
Cæsar
1.00
becauſe
0.95
itſelf
0.95
pleaſure
0.94
himſelf
0.90
fevere
0.89
Activations Density 0.954%