INDEX
Explanations
phrases indicating fear or lack of fear in various contexts
instances of fear or anxiety-related terms
New Auto-Interp
Negative Logits
issance
-0.80
cise
-0.76
byter
-0.69
akeru
-0.69
eret
-0.67
erry
-0.66
anmar
-0.66
ankind
-0.65
eor
-0.65
dates
-0.65
POSITIVE LOGITS
lest
1.16
afraid
0.87
of
0.79
NESS
0.74
ptin
0.72
lessly
0.72
onda
0.71
thereof
0.71
mong
0.71
crow
0.70
Activations Density 0.031%