INDEX
Explanations
phrases encouraging or discouraging specific behaviors
phrases that encourage positive behaviors and attitudes
New Auto-Interp
Negative Logits
PsyNetMessage
-0.63
Explosion
-0.62
Prescott
-0.62
Lines
-0.61
suffice
-0.60
reconstruction
-0.60
Topics
-0.60
Strait
-0.59
autopsy
-0.59
fray
-0.59
POSITIVE LOGITS
able
1.14
thankful
1.13
aware
1.09
afraid
1.02
honest
1.01
grateful
1.01
careful
1.01
friends
0.96
ashamed
0.96
proud
0.95
Activations Density 0.167%