INDEX
Explanations
terms related to arguments and debates
references to arguments or discussions
New Auto-Interp
Negative Logits
eco
-0.73
livest
-0.71
cler
-0.69
ummer
-0.68
Atomic
-0.67
Seym
-0.67
isner
-0.65
aches
-0.64
oho
-0.63
Mos
-0.61
POSITIVE LOGITS
uments
1.12
ative
1.05
arguments
0.99
argument
0.89
ument
0.87
ļéĨĴ
0.85
acle
0.81
arguing
0.77
against
0.77
-+-+
0.76
Activations Density 0.024%