INDEX
    Explanations

    expressions related to informative content and personal experiences

    New Auto-Interp
    Negative Logits
    astify
    -0.69
    Попис
    -0.57
    twór
    -0.55
    Географија
    -0.54
    锈钢
    -0.53
    bebe
    -0.51
    hlon
    -0.51
    ytale
    -0.51
    apunov
    -0.49
    Palabras
    -0.49
    POSITIVE LOGITS
     informative
    1.16
     enlightening
    0.98
     interesting
    0.98
     useful
    0.95
     helpful
    0.90
    Inform
    0.90
    inform
    0.90
     insightful
    0.89
     illuminating
    0.88
     Inform
    0.86
    Act Density 0.366%

    No Known Activations