INDEX
Explanations
references to different individuals along with a brief description or background information about them
New Auto-Interp
Negative Logits
canon
-0.86
eus
-0.78
assassination
-0.75
infall
-0.75
Ministers
-0.69
throne
-0.69
discredited
-0.68
tainted
-0.68
treason
-0.67
ransom
-0.67
POSITIVE LOGITS
veland
0.90
biking
0.81
volunteering
0.78
Redditor
0.78
commuting
0.77
oola
0.75
crochet
0.74
autistic
0.74
knitting
0.74
unemployed
0.73
Activations Density 0.529%