INDEX
Explanations
phrases related to political and legal discussions
New Auto-Interp
Negative Logits
*/(
-0.74
orate
-0.70
istically
-0.69
istical
-0.68
ibilities
-0.67
aciously
-0.67
proble
-0.66
bably
-0.66
othal
-0.64
thal
-0.63
POSITIVE LOGITS
ours
0.84
those
0.78
those
0.74
unts
0.74
Ray
0.68
çͰ
0.67
ounter
0.65
par
0.63
myself
0.63
Graves
0.62
Activations Density 0.056%