INDEX
Explanations
indicators of emotional regulation and cognitive evaluation
New Auto-Interp
Negative Logits
Fiesta
-0.79
perks
-0.73
Patriot
-0.72
ulhu
-0.72
Benny
-0.72
chops
-0.71
stunts
-0.71
Wiz
-0.70
Borders
-0.70
Mama
-0.70
POSITIVE LOGITS
derived
1.28
biased
1.20
induced
1.17
mediated
1.16
treatment
1.16
directed
1.16
treated
1.15
negative
1.15
specific
1.15
adapt
1.14
Activations Density 0.067%