INDEX
Explanations
themes related to fear and vulnerability
New Auto-Interp
Negative Logits
ILI
-0.18
teb
-0.16
irritating
-0.16
frustrating
-0.15
ennen
-0.15
alian
-0.14
egrity
-0.14
vez
-0.14
zcze
-0.14
-lfs
-0.14
POSITIVE LOGITS
fear
0.67
Fear
0.66
Fear
0.63
terrified
0.50
fears
0.50
fearful
0.50
frightened
0.48
scared
0.48
æģIJ
0.46
afraid
0.46
Activations Density 0.486%