INDEX
Explanations
phrases indicating caution, hesitation, or concern about potential negative consequences
references to caution or restrictions related to safety and privacy
New Auto-Interp
Negative Logits
eer
-0.67
Nanto
-0.67
average
-0.67
Registered
-0.66
gas
-0.64
boot
-0.63
erial
-0.62
inals
-0.62
tool
-0.60
yssey
-0.60
POSITIVE LOGITS
fear
1.18
fearing
1.14
fears
1.09
grounds
1.04
citing
1.03
concerns
1.02
reasons
1.00
reason
0.92
disagreements
0.90
objections
0.88
Activations Density 0.400%