INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Majefty
    -0.90
     Prist
    -0.84
     invid
    -0.83
     Platon
    -0.77
     Pries
    -0.77
     Maur
    -0.75
     pleaſure
    -0.74
     Mauritania
    -0.74
    OGND
    -0.72
     kano
    -0.71
    POSITIVE LOGITS
     Se
    1.02
     se
    0.96
    Se
    0.89
     Selig
    0.88
     haberse
    0.86
     להת
    0.84
     SE
    0.81
    0.80
    RegressionTest
    0.76
     Seidel
    0.74
    Act Density 0.058%

    No Known Activations