INDEX
Explanations
phrases that indicate authorship or attribution related to actions or events
New Auto-Interp
Negative Logits
houſe
-0.78
itſelf
-0.76
calvin
-0.74
Reſ
-0.72
purpoſe
-0.70
Chriftian
-0.70
zealand
-0.68
Oedipus
-0.68
Anſ
-0.67
Oilers
-0.67
POSITIVE LOGITS
BY
1.52
by
1.44
By
1.30
oleh
1.27
by
1.26
Byers
1.26
BY
1.19
By
1.17
filterBy
1.14
selectBy
1.10
Activations Density 0.401%