INDEX
    Explanations

    references to religious figures and titles

    New Auto-Interp
    Negative Logits
     Atra
    -0.67
     Egli
    -0.67
    eraus
    -0.65
    rateful
    -0.65
     Bris
    -0.63
     ali
    -0.61
    Atra
    -0.61
    vrons
    -0.61
     Dalla
    -0.60
    ]];
    -0.59
    POSITIVE LOGITS
     Lord
    2.16
     LORD
    1.97
     lord
    1.96
    Lord
    1.93
     Lords
    1.78
    LORD
    1.69
     lords
    1.63
    lord
    1.55
     Seigneur
    1.28
    lords
    1.19
    Act Density 0.024%

    No Known Activations