INDEX
Explanations
phrases related to political discussions
phrases indicating a sense of negativity or hardship
New Auto-Interp
Negative Logits
metic
-0.68
Vector
-0.59
omorphic
-0.58
oun
-0.57
Owl
-0.56
mosqu
-0.56
citiz
-0.56
blanket
-0.54
ected
-0.54
Frontier
-0.54
POSITIVE LOGITS
s
1.85
ses
1.41
sb
1.20
sat
1.08
sg
1.04
sets
1.04
si
1.04
itates
1.03
ski
1.02
ends
1.02
Activations Density 0.160%