INDEX
Explanations
references to the concept of life or living entities
references to human lives and their well-being
New Auto-Interp
Negative Logits
NES
-0.72
AMI
-0.71
CAST
-0.68
ority
-0.67
consolidation
-0.64
Null
-0.64
Rat
-0.63
Syndicate
-0.62
iban
-0.62
NESS
-0.60
POSITIVE LOGITS
chool
1.10
cape
0.92
ynthesis
0.91
pring
0.88
hack
0.82
lihood
0.81
erver
0.81
paces
0.79
ongs
0.79
guard
0.78
Activations Density 0.022%