INDEX
Negative Logits
popup
-0.07
writing
-0.06
Debbie
-0.06
iscard
-0.06
Kashmir
-0.06
notas
-0.06
-writing
-0.06
_GRE
-0.06
fuck
-0.06
.people
-0.06
POSITIVE LOGITS
URED
0.07
stva
0.07
inter
0.06
Average
0.06
Begins
0.06
yu
0.06
terior
0.06
взаим
0.06
신
0.06
resco
0.06
Activations Density 0.010%