INDEX
Explanations
phrases related to making arguments or presenting points
instances of the word "argument" and related discussions
New Auto-Interp
Negative Logits
ookie
-0.66
livest
-0.64
Seym
-0.64
ISTER
-0.63
Atomic
-0.62
Carbuncle
-0.61
eco
-0.61
cler
-0.61
curfew
-0.61
PHOTO
-0.60
POSITIVE LOGITS
ative
1.24
uments
1.16
against
1.04
arguments
0.95
abl
0.94
ument
0.93
argument
0.86
arguing
0.85
persu
0.84
Against
0.84
Activations Density 0.039%