INDEX
Explanations
references to the term "Debate" and its variations
New Auto-Interp
Negative Logits
rophic
-0.16
lle
-0.15
isci
-0.15
ÙĦÛĮسÛĮ
-0.15
GINE
-0.15
kte
-0.15
mini
-0.15
knull
-0.15
ivet
-0.14
exploitation
-0.14
POSITIVE LOGITS
Deb
0.23
ates
0.20
endra
0.19
ussy
0.19
deb
0.18
bye
0.18
deb
0.18
ilitating
0.18
chema
0.18
ora
0.18
Activations Density 0.008%