INDEX
Explanations
phrases related to statements or opinions
terms related to potential restrictions or limitations
New Auto-Interp
Negative Logits
Mous
-0.75
Manit
-0.69
anwhile
-0.67
princ
-0.66
Sprint
-0.65
Franch
-0.62
Albuquerque
-0.61
CMS
-0.61
Huntington
-0.60
sacrific
-0.60
POSITIVE LOGITS
ľ
1.56
Ŀ
1.30
¼
1.22
¡
1.18
Ń
1.13
¿
1.11
Ķ
1.11
°
1.05
¦
1.03
âĶĢ
1.03
Activations Density 0.255%