INDEX
Explanations
the word 'ug' at varying activation levels
repeated mentions of the term "Guggenheim."
New Auto-Interp
Negative Logits
peak
-0.70
Leilan
-0.69
calming
-0.65
compr
-0.62
cape
-0.61
infancy
-0.61
wards
-0.61
blocker
-0.60
targ
-0.60
Hemp
-0.60
POSITIVE LOGITS
glers
1.42
uese
1.20
gery
1.15
uay
1.15
ged
1.08
ging
1.06
nant
1.00
gers
0.99
ger
0.96
ats
0.93
Activations Density 0.012%