INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     trust
    -1.30
     Trust
    -1.27
    Trust
    -1.11
    trust
    -1.09
     TRUST
    -0.99
     trusts
    -0.91
     Trusts
    -0.78
     trusting
    -0.77
    TRUST
    -0.71
    <bos>
    -0.69
    POSITIVE LOGITS
     Reſ
    0.69
    kjø
    0.63
     Anſ
    0.62
     mourut
    0.62
     themſelves
    0.62
     Houſe
    0.61
    jee
    0.60
     necessárias
    0.59
     greateſt
    0.59
     merve
    0.59
    Act Density 0.029%

    No Known Activations