INDEX
Explanations
expressions of emotional struggle and resilience
New Auto-Interp
Negative Logits
tc
-0.17
etz
-0.15
emer
-0.15
yx
-0.15
abad
-0.15
upe
-0.15
διο
-0.14
ived
-0.14
flo
-0.14
ep
-0.14
POSITIVE LOGITS
handle
0.45
handles
0.38
handling
0.38
tolerate
0.38
Handle
0.37
toler
0.37
handle
0.37
tol
0.35
.handle
0.35
Handling
0.34
Activations Density 0.128%