INDEX
Explanations
information attributed to a spokesperson
references to spokespersons in various contexts
New Auto-Interp
Negative Logits
dream
-0.69
cffff
-0.67
bows
-0.66
lif
-0.65
NetMessage
-0.65
avery
-0.64
burning
-0.64
panel
-0.62
spir
-0.62
levels
-0.61
POSITIVE LOGITS
spokesperson
0.79
uted
0.78
Steph
0.78
atography
0.76
onse
0.76
olicy
0.74
clarified
0.73
hips
0.73
emailed
0.72
ials
0.72
Activations Density 0.029%