INDEX
Explanations
keywords related to logical reasoning and persuasive language
occurrences of the word "arguments."
New Auto-Interp
Negative Logits
orporated
-0.67
onet
-0.66
fecture
-0.66
behold
-0.63
covered
-0.63
ifter
-0.62
Merrill
-0.62
ishable
-0.62
cko
-0.60
iph
-0.59
POSITIVE LOGITS
arguments
3.72
argument
2.66
argument
2.23
Argument
2.10
Arg
1.71
objections
1.66
debates
1.58
assertions
1.54
Arg
1.52
arguing
1.49
Activations Density 0.015%