INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     nested
    -0.10
     mood
    -0.09
     âĢIJ
    -0.09
     hitch
    -0.09
    unge
    -0.09
     rall
    -0.09
     wartime
    -0.09
    meyi
    -0.09
    idot
    -0.08
    uries
    -0.08
    POSITIVE LOGITS
     Battle
    0.17
     Waterloo
    0.16
     Marathon
    0.15
     Hastings
    0.14
    Battle
    0.14
     battle
    0.13
     engagements
    0.12
    battle
    0.12
     Therm
    0.12
     slag
    0.11
    Act Density 0.064%

    No Known Activations