INDEX
Explanations
phrases indicating negligence and its consequences, particularly in the context of safety and liability
New Auto-Interp
Negative Logits
çĬ¯
-0.18
272
-0.17
æĹı
-0.15
Trou
-0.15
ardon
-0.15
Trou
-0.14
WithError
-0.14
uffers
-0.14
ounters
-0.13
linger
-0.13
POSITIVE LOGITS
loss
0.18
outright
0.18
lack
0.16
lost
0.16
bad
0.16
downright
0.16
loss
0.16
threats
0.14
missing
0.14
Ñģли
0.14
Activations Density 0.387%