INDEX
Explanations
instances of the word "debate" and its variations
New Auto-Interp
Negative Logits
ses
-0.10
yat
-0.08
поб
-0.07
βά
-0.07
же
-0.07
rone
-0.07
_subplot
-0.07
åĢij
-0.07
ll
-0.07
sm
-0.07
POSITIVE LOGITS
/disc
0.09
室
0.08
about
0.08
.nlm
0.07
able
0.07
/question
0.07
ative
0.07
afil
0.07
ble
0.07
yne
0.07
Activations Density 0.006%