INDEX
Explanations
phrases related to actions or events involving personal risk or urgency
phrases that indicate concern or fear for someone's safety or well-being
New Auto-Interp
Negative Logits
thanking
-0.77
ONLY
-0.61
Citation
-0.60
alot
-0.60
EVERY
-0.57
continuing
-0.56
summar
-0.56
THIS
-0.56
freaking
-0.53
referring
-0.53
POSITIVE LOGITS
aband
0.86
sie
0.79
fit
0.73
effect
0.73
atches
0.70
ventures
0.66
ares
0.65
oots
0.65
luster
0.64
iously
0.63
Activations Density 0.853%