INDEX
Explanations
familial relationships and dynamics
children and siblings
New Auto-Interp
Negative Logits
beſte
-0.80
eſſ
-0.76
pleaſure
-0.75
faſt
-0.73
ſta
-0.73
queſta
-0.73
ſtate
-0.73
ſua
-0.72
ſelf
-0.71
purpoſe
-0.70
POSITIVE LOGITS
spoiled
0.69
spoilt
0.68
spoiling
0.56
adored
0.50
pam
0.48
spoil
0.44
宠
0.42
born
0.40
sibling
0.39
cherished
0.39
Activations Density 0.041%