INDEX
Explanations
emotional expressions and personal growth experiences
New Auto-Interp
Negative Logits
enberg
-0.19
ecstatic
-0.19
oire
-0.16
uez
-0.15
contres
-0.15
Died
-0.14
iene
-0.14
ordion
-0.14
IPH
-0.13
ANEL
-0.13
POSITIVE LOGITS
gri
0.27
moved
0.27
swept
0.24
seized
0.24
struck
0.22
pulled
0.22
consumed
0.21
drawn
0.20
touched
0.20
propelled
0.20
Activations Density 0.346%