INDEX
Explanations
notions of negation or contrast in various contexts
New Auto-Interp
Negative Logits
-0.62
A
-0.56
A
-0.54
N
-0.49
<strong>
-0.48
N
-0.44
and
-0.44
L
-0.44
amp
-0.43
G
-0.43
POSITIVE LOGITS
itſelf
1.20
aarrggbb
1.19
myſelf
1.06
Theſe
1.06
faſt
1.04
autorytatywna
1.03
Reſ
1.01
consultato
1.00
Monfieur
0.98
raiſ
0.97
Activations Density 0.294%