INDEX
Negative Logits
itſelf
-0.85
ordinances
-0.83
treaties
-0.81
decrees
-0.79
ſeveral
-0.75
myſelf
-0.74
Theſe
-0.73
")->
-0.71
Ordin
-0.69
accords
-0.69
POSITIVE LOGITS
and
0.57
.
0.56
key
0.52
ver
0.49
local
0.49
ré
0.49
so
0.49
fun
0.49
enderror
0.47
pre
0.47
Activations Density 0.020%