INDEX
Explanations
phrases related to political positions and directives
New Auto-Interp
Negative Logits
ibaba
-0.97
etsk
-0.71
abi
-0.71
ovsky
-0.63
ashtra
-0.61
burgh
-0.60
agascar
-0.58
endas
-0.57
oÄŁ
-0.56
worshipped
-0.56
POSITIVE LOGITS
margins
0.94
margin
0.84
standards
0.84
virtue
0.78
means
0.76
Means
0.72
committee
0.71
Numbers
0.71
sheer
0.68
standpoint
0.63
Activations Density 0.888%