INDEX
Explanations
references to betrayal and treason
words related to betrayal or disloyalty
New Auto-Interp
Negative Logits
gor
-0.75
EVA
-0.72
Electric
-0.71
Roll
-0.70
Morning
-0.68
Cancel
-0.68
Environment
-0.67
Delivery
-0.65
Sample
-0.65
arel
-0.64
POSITIVE LOGITS
traitor
0.96
treason
0.94
sympath
0.93
plotting
0.90
guiActiveUn
0.86
plot
0.82
uous
0.81
subversive
0.81
plotted
0.81
salute
0.79
Activations Density 0.025%