INDEX
Explanations
arguments or debates in textual content
New Auto-Interp
Negative Logits
cler
-0.72
livest
-0.66
Seym
-0.65
ookie
-0.64
eco
-0.64
idays
-0.63
aches
-0.62
Carbuncle
-0.62
Atomic
-0.62
lights
-0.61
POSITIVE LOGITS
ative
1.12
uments
1.10
against
1.06
arguments
0.92
abl
0.88
ument
0.87
persu
0.86
argument
0.86
Against
0.83
arguing
0.83
Activations Density 0.032%