INDEX
Explanations
occurrences of the word "step"
New Auto-Interp
Negative Logits
Unic
-0.76
orem
-0.75
Pengu
-0.74
ortunately
-0.72
ILLE
-0.69
tiss
-0.69
eatures
-0.69
ominated
-0.67
essage
-0.65
inately
-0.64
POSITIVE LOGITS
daughter
1.18
dad
1.11
brother
1.04
mother
0.98
father
0.98
hens
0.89
isters
0.88
steps
0.88
mom
0.84
step
0.83
Activations Density 0.017%