INDEX
Explanations
conditional phrases related to decision-making or precautions in various contexts
safety and caution
to be safe, clear, or explicit
New Auto-Interp
Negative Logits
itſelf
-0.75
myſelf
-0.66
OGND
-0.65
Theſe
-0.64
Eſ
-0.63
Efq
-0.63
perſon
-0.63
Anſ
-0.62
ſta
-0.61
ſelf
-0.60
POSITIVE LOGITS
precaution
1.32
precautionary
1.16
precau
1.02
caution
1.00
safety
0.94
cautious
0.93
safest
0.93
safer
0.93
precautions
0.87
safety
0.86
Activations Density 0.178%