INDEX
Explanations
references to physical harm or danger in a context of warnings or incidents
keywords related to injuries and safety concerns
New Auto-Interp
Negative Logits
ku
-0.86
liam
-0.75
kt
-0.74
ãĤ©
-0.72
atl
-0.71
utm
-0.68
kh
-0.68
last
-0.68
ãĤ¨
-0.66
achment
-0.64
POSITIVE LOGITS
nor
1.15
anymore
1.06
whatsoever
1.03
Initialized
0.73
slightest
0.68
;}
0.68
detectable
0.68
anybody
0.67
anything
0.66
uyomi
0.66
Activations Density 0.653%