INDEX
Explanations
terms associated with serious harm or injury
New Auto-Interp
Negative Logits
bye
-0.73
LCS
-0.61
Eag
-0.60
Tanz
-0.58
foam
-0.58
Principal
-0.58
Finn
-0.56
blindly
-0.55
Noise
-0.55
electronically
-0.55
POSITIVE LOGITS
ous
2.73
ously
2.47
OUS
1.57
osity
1.38
iously
1.34
istically
1.30
istic
1.28
ized
1.22
ious
1.22
izing
1.22
Activations Density 0.018%