INDEX
Explanations
phrases related to expressing disappointment or disapproval
negative statements related to actions or behaviors
New Auto-Interp
Negative Logits
Marina
-0.62
Juda
-0.60
gradually
-0.60
periodically
-0.59
udo
-0.58
Kafka
-0.57
Kraft
-0.56
Abbas
-0.56
Liberty
-0.55
Parad
-0.55
POSITIVE LOGITS
anymore
1.41
âĢ
1.28
̶
1.17
âľ
1.13
*.
1.09
\.
1.04
âĺ
1.03
âķ
1.03
ãĢ
1.03
nor
1.02
Activations Density 0.506%