INDEX
Explanations
expressions of self-doubt and vulnerability
New Auto-Interp
Negative Logits
äºĪ
-0.19
ignon
-0.17
ätz
-0.15
angi
-0.15
scar
-0.14
OLUTE
-0.14
auer
-0.14
ůž
-0.14
Ignoring
-0.14
)↵↵↵↵↵↵↵↵
-0.13
POSITIVE LOGITS
doubts
0.27
doubt
0.25
feelings
0.24
insecure
0.23
feeling
0.22
doub
0.22
Doub
0.21
inse
0.21
internal
0.20
feels
0.20
Activations Density 0.285%