INDEX
Negative Logits
aarrggbb
-0.50
(
-0.46
↵
-0.45
ot
-0.44
In
-0.44
to
-0.43
↵↵
-0.43
Interess
-0.43
маг
-0.42
sy
-0.41
POSITIVE LOGITS
abestanden
0.81
Roskov
0.81
$_"
0.80
poffible
0.79
Wikimedijinoj
0.77
ſmall
0.77
themſelves
0.76
Anſ
0.75
fubject
0.75
pleaſure
0.74
Activations Density 0.004%