INDEX
Negative Logits
auffi
-1.07
Theſe
-1.07
ſta
-1.06
pleaſure
-1.05
houſe
-1.02
purpoſe
-1.00
ſelves
-0.98
Diſ
-0.98
Reſ
-0.98
Autoritní
-0.96
POSITIVE LOGITS
0.72
in
0.70
↵
0.62
(
0.58
[
0.58
ma
0.57
Re
0.57
of
0.57
In
0.54
m
0.53
Activations Density 0.017%