INDEX
    Explanations

    references to publication volumes

    New Auto-Interp
    Negative Logits
     myſelf
    -0.86
     pleaſure
    -0.80
    wiſe
    -0.77
    eriksaan
    -0.75
     Monfieur
    -0.74
     themſelves
    -0.73
     himſelf
    -0.73
    ✨:
    -0.71
     Theſe
    -0.68
     itſelf
    -0.67
    POSITIVE LOGITS
     Vol
    2.56
     vol
    2.53
    Vol
    2.45
    vol
    2.38
     VOL
    2.25
    VOL
    2.08
     Vols
    1.70
     vols
    1.52
    vols
    1.35
     Volker
    1.18
    Act Density 0.032%

    No Known Activations