INDEX
Explanations
expressions of enthusiasm and emotional engagement
New Auto-Interp
Negative Logits
lug
-0.15
bounce
-0.15
dsl
-0.15
ags
-0.14
emade
-0.14
ificates
-0.14
elijke
-0.14
Ub
-0.14
æk
-0.13
Heard
-0.13
POSITIVE LOGITS
entr
0.25
ent
0.23
aptured
0.20
ens
0.19
capt
0.18
üst
0.18
captive
0.17
ench
0.16
eb
0.16
ron
0.16
Activations Density 0.030%