INDEX
Explanations
debates or discussions surrounding political or controversial topics
New Auto-Interp
Negative Logits
culosis
-0.74
Exchange
-0.68
acket
-0.66
esa
-0.65
moil
-0.64
otonin
-0.63
Newsletter
-0.63
rehend
-0.63
ourse
-0.60
uates
-0.60
POSITIVE LOGITS
cause
0.98
ya
0.87
huh
0.82
eh
0.69
sir
0.69
especially
0.69
pecially
0.69
congr
0.67
JV
0.66
laughs
0.65
Activations Density 0.233%