INDEX
Explanations
words related to survival or threats to survival
references to the concept of survival
New Auto-Interp
Negative Logits
hops
-0.76
imus
-0.75
nee
-0.71
gio
-0.70
Heard
-0.68
Address
-0.68
orate
-0.66
Bridges
-0.65
Dame
-0.65
accuses
-0.65
POSITIVE LOGITS
survival
3.95
Survival
2.58
surv
2.47
Surv
1.74
survive
1.68
Surviv
1.62
surviv
1.59
surviving
1.50
surv
1.45
Surv
1.43
Activations Density 0.019%