INDEX
Explanations
personal anecdotes and memories
references to personal memories or experiences from childhood
New Auto-Interp
Negative Logits
###
-0.80
Auth
-0.78
Update
-0.78
yip
-0.73
intends
-0.73
ERG
-0.72
xit
-0.71
Profit
-0.71
Leaks
-0.71
confir
-0.70
POSITIVE LOGITS
classmates
1.28
summers
1.24
babys
1.13
preschool
1.08
Dad
1.07
classmate
1.05
taught
1.03
puberty
1.03
my
0.99
Dad
0.97
Activations Density 0.559%