INDEX
Explanations
phrases related to politics and governance
New Auto-Interp
Negative Logits
thood
-0.69
namely
-0.68
replace
-0.67
arians
-0.66
hereby
-0.66
owns
-0.65
craft
-0.65
separately
-0.65
watching
-0.63
wisely
-0.63
POSITIVE LOGITS
slightest
1.06
entirety
1.04
proverbial
1.04
entire
0.96
remainder
0.94
ses
0.91
burden
0.89
gap
0.88
envelope
0.87
same
0.86
Activations Density 0.184%