INDEX
Explanations
references to situations or descriptions that generate fear or panic
phrases related to fear or alarming situations
New Auto-Interp
Negative Logits
cially
-0.78
iciency
-0.72
offic
-0.71
slightest
-0.71
authenticated
-0.70
entials
-0.69
edded
-0.68
odore
-0.68
arrang
-0.67
tein
-0.65
POSITIVE LOGITS
crow
1.80
mong
1.33
warts
0.91
ingly
0.89
bite
0.76
Cry
0.75
wart
0.75
faced
0.73
fu
0.72
abies
0.71
Activations Density 0.023%