INDEX
Explanations
mentions of specific individuals and their actions or situations
New Auto-Interp
Negative Logits
selves
-0.93
unison
-0.86
selves
-0.84
results
-0.69
Recommend
-0.67
asses
-0.66
OTAL
-0.66
collective
-0.66
Consumers
-0.65
mination
-0.65
POSITIVE LOGITS
himself
1.66
Himself
1.19
assassinated
1.06
his
1.02
herself
0.97
famously
0.94
personally
0.94
enegger
0.89
resigned
0.88
persona
0.88
Activations Density 8.478%