INDEX
Explanations
the word "Parents"
references to parents or parental figures
New Auto-Interp
Negative Logits
Flavoring
-0.79
Kingdoms
-0.77
Antar
-0.72
atility
-0.67
paio
-0.65
ibaba
-0.64
sclerosis
-0.64
Britann
-0.64
jriwal
-0.63
Baldwin
-0.62
POSITIVE LOGITS
hetical
1.52
hesis
1.37
hetically
1.19
heses
1.13
parents
0.94
hes
0.90
hood
0.84
Parents
0.78
father
0.74
parent
0.70
Activations Density 0.039%