INDEX
Explanations
phrases relating to being unharmed or safe
mentions of the word "har" or related variations, possibly indicating a focus on the term's usage in various contexts
New Auto-Interp
Negative Logits
htaking
-0.91
lect
-0.81
essee
-0.78
uring
-0.70
hift
-0.70
BOOK
-0.70
urally
-0.69
olve
-0.67
fuzz
-0.64
chalk
-0.64
POSITIVE LOGITS
assment
1.01
tha
0.99
allel
0.93
Tsarnaev
0.91
vard
0.90
riors
0.89
riage
0.85
riages
0.84
ashtra
0.82
adow
0.80
Activations Density 0.066%