INDEX
Explanations
statements and responses related to political or official communication
New Auto-Interp
Negative Logits
Prov
-0.15
696
-0.15
pitched
-0.14
owers
-0.14
prov
-0.14
prov
-0.14
stro
-0.14
directly
-0.14
aul
-0.13
Bour
-0.13
POSITIVE LOGITS
Trang
0.15
semiclass
0.15
Bund
0.15
Soup
0.14
warts
0.14
extr
0.14
cu
0.14
UNK
0.14
Ñģви
0.13
zsche
0.13
Activations Density 0.062%