INDEX
Explanations
references to the concept of dawn or related imagery
New Auto-Interp
Negative Logits
ttle
-0.79
chn
-0.75
idium
-0.74
glomer
-0.72
apter
-0.71
Wiki
-0.69
Mellon
-0.68
orie
-0.67
akia
-0.67
chens
-0.66
POSITIVE LOGITS
ing
0.95
dawn
0.88
hower
0.86
fall
0.85
mare
0.81
mares
0.81
shire
0.81
timers
0.80
break
0.77
ly
0.77
Activations Density 0.005%