INDEX
Negative Logits
rộng
-0.09
scammers
-0.08
litigation
-0.08
où
-0.08
Carnival
-0.08
scams
-0.07
lawsuits
-0.07
അനു�
-0.07
شق
-0.07
cheerful
-0.07
POSITIVE LOGITS
imposed
0.09
preconce
0.09
anything
0.09
Enforcement
0.09
constraint
0.08
urity
0.08
enforcing
0.08
perfectly
0.08
anything
0.08
necesariamente
0.08
Activations Density 0.026%