INDEX
Explanations
phrases indicating urgency or a critical situation
New Auto-Interp
Negative Logits
éļª
-0.15
ì»
-0.15
224
-0.15
imple
-0.15
peek
-0.14
engkap
-0.14
ãĥĥãĥĪ
-0.14
meler
-0.14
нÑĮ
-0.13
TextWriter
-0.13
POSITIVE LOGITS
DEST
0.28
absolutely
0.28
obl
0.27
sm
0.25
smoked
0.24
torch
0.24
dec
0.24
tear
0.23
destroy
0.23
destroy
0.23
Activations Density 0.333%