INDEX
Explanations
arguments or perspectives put forth in a text
phrases that present arguments or counterarguments
New Auto-Interp
Negative Logits
fecture
-0.71
Volunte
-0.70
naughty
-0.66
Niño
-0.66
Transition
-0.65
Monitoring
-0.64
destiny
-0.64
superv
-0.62
Nurs
-0.62
Sync
-0.62
POSITIVE LOGITS
arguments
1.48
argument
1.41
argument
1.40
Argument
1.36
arguing
1.33
Arg
1.31
rebutt
1.30
convinc
1.23
persuasive
1.20
refute
1.19
Activations Density 0.369%