INDEX
Explanations
negative sentiments or expressions of dissatisfaction
Token before a number
bibliographic references
New Auto-Interp
Negative Logits
both
-0.56
both
-0.54
...
-0.49
!
-0.49
Literatura
-0.47
...
-0.47
*
-0.46
the
-0.45
Примітки
-0.45
*
-0.45
POSITIVE LOGITS
raiſ
0.85
ſever
0.81
poffe
0.78
fevere
0.78
faſt
0.77
ſta
0.77
Monfieur
0.73
ſche
0.73
Charlemagne
0.73
greateſt
0.72
Activations Density 0.313%