INDEX
Explanations
the word "fountain" at various activations
references to fountains
New Auto-Interp
Negative Logits
ACTIONS
-0.65
VICE
-0.65
WATCHED
-0.64
eon
-0.64
Debor
-0.62
alien
-0.61
Anarchy
-0.59
Ey
-0.59
nd
-0.59
Cosponsors
-0.59
POSITIVE LOGITS
fountain
1.28
Fountain
1.27
pens
0.87
issance
0.81
hesda
0.80
empt
0.78
pen
0.77
sonian
0.76
tailed
0.73
shire
0.73
Activations Density 0.004%