INDEX
Negative Logits
subp
-0.70
RANT
-0.62
measures
-0.61
saline
-0.61
lihood
-0.60
rict
-0.59
judging
-0.58
Simpson
-0.57
paddle
-0.57
living
-0.57
POSITIVE LOGITS
Aviv
1.41
estial
1.09
ugu
1.04
stra
1.00
lez
0.95
angelo
0.92
eno
0.91
ibia
0.87
anyahu
0.86
edy
0.84
Activations Density 0.025%