INDEX
Explanations
proper nouns referring to people or organizations
references to individuals or entities in health-related contexts
New Auto-Interp
Negative Logits
shift
-0.66
unbeliev
-0.57
Principle
-0.56
venge
-0.56
sense
-0.56
grow
-0.55
detection
-0.55
iscovery
-0.54
realization
-0.54
union
-0.52
POSITIVE LOGITS
did
1.47
didn
1.24
gave
1.15
took
1.14
waited
1.14
DID
1.13
didnt
1.12
stayed
1.12
went
1.10
withdrew
1.09
Activations Density 0.366%