INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     quæ
    -0.72
     purpoſe
    -0.72
     myſelf
    -0.67
     Monfieur
    -0.67
     pleaſure
    -0.66
     ſtate
    -0.65
     ſtre
    -0.65
     raiſ
    -0.64
     Majefty
    -0.64
     ſta
    -0.63
    POSITIVE LOGITS
     te
    0.58
    0.57
    TabStop
    0.55
     en
    0.52
    )|^{
    0.52
    addCriterion
    0.51
     ویکی‌پدیا
    0.51
    Hochspringen
    0.51
    IBOutlet
    0.50
     esternos
    0.49
    Act Density 0.016%

    No Known Activations