INDEX
Explanations
phrases related to political ideologies and economic systems
New Auto-Interp
Negative Logits
SourceFile
-0.73
olves
-0.70
ULAR
-0.69
ãĥ¥
-0.69
arily
-0.65
erver
-0.62
minster
-0.62
assed
-0.62
alysed
-0.61
MpServer
-0.61
POSITIVE LOGITS
however
1.10
although
0.97
according
0.94
there
0.92
though
0.92
despite
0.87
unlike
0.86
moreover
0.82
we
0.82
somew
0.82
Activations Density 0.880%