INDEX
Explanations
family relationships
mentions of parental figures and family relationships
New Auto-Interp
Negative Logits
hikers
-0.69
"]=>
-0.67
Increases
-0.65
cerning
-0.63
umbn
-0.63
owler
-0.63
=>
-0.62
awar
-0.62
estyles
-0.61
intensity
-0.61
POSITIVE LOGITS
died
1.00
divorced
0.95
uncle
0.80
ma
0.80
perished
0.77
hood
0.76
deceased
0.73
disappro
0.73
ancestors
0.73
ancest
0.72
Activations Density 0.066%