INDEX
Explanations
instances of the word "appeal."
references to appeals made to various groups or demographics
New Auto-Interp
Negative Logits
sterdam
-0.88
llan
-0.75
pite
-0.75
hesda
-0.72
alde
-0.65
lance
-0.64
unused
-0.63
Laur
-0.63
awarded
-0.63
ikes
-0.61
POSITIVE LOGITS
gauge
0.86
caution
0.76
å§«
0.73
asty
0.70
strings
0.68
reconsider
0.68
reassure
0.68
Gau
0.67
heights
0.66
impulse
0.65
Activations Density 0.083%