INDEX
    Explanations

    phrases related to validation or verification

    terms related to validity or legitimacy

    New Auto-Interp
    Negative Logits
    xual
    -0.79
    hedon
    -0.73
     preferring
    -0.66
     traged
    -0.65
    mania
    -0.62
    hell
    -0.62
     superflu
    -0.62
     Alive
    -0.62
    hedral
    -0.61
    Mania
    -0.60
    POSITIVE LOGITS
    ating
    1.52
    ators
    1.41
    ator
    1.33
    ates
    1.21
    ated
    1.15
    ations
    1.09
    atable
    1.05
    ATING
    1.04
    ation
    0.97
    iated
    0.92
    Act Density 0.026%

    No Known Activations