INDEX
Explanations
explicit warnings or cautions in a text
phrases that emphasize safety precautions and warnings
New Auto-Interp
Negative Logits
cart
-0.69
magically
-0.67
Founder
-0.66
descendants
-0.63
independents
-0.62
Spons
-0.61
creator
-0.61
Canad
-0.60
creator
-0.60
Joined
-0.59
POSITIVE LOGITS
beware
1.13
precautions
1.12
caution
1.11
Avoid
1.02
Avoid
0.99
avoid
0.98
lest
0.96
carefully
0.94
heed
0.94
precaution
0.92
Activations Density 0.845%