INDEX
Explanations
phrases related to escape and evasion
instances of the word "escape" and its variants
New Auto-Interp
Negative Logits
inki
-0.75
reads
-0.71
ificent
-0.69
wow
-0.61
swear
-0.58
spare
-0.58
¾
-0.58
spir
-0.58
iop
-0.57
stead
-0.57
POSITIVE LOGITS
detection
1.04
unsc
1.00
confinement
0.95
captivity
0.91
capture
0.84
apprehension
0.82
punishment
0.80
Torment
0.77
justice
0.77
prosecution
0.77
Activations Density 0.042%