INDEX
    Explanations

    expressions about the significance or relevance of a concept or phenomenon

    New Auto-Interp
    Negative Logits
     major
    -0.40
    Autoritní
    -0.40
    有力
    -0.36
    major
    -0.34
    -0.33
     funny
    -0.31
    openModal
    -0.31
     shiny
    -0.30
     studio
    -0.30
    -0.30
    POSITIVE LOGITS
     importance
    3.59
     Importance
    3.16
    importance
    3.11
    Importance
    3.06
     importancia
    2.67
     importância
    2.44
    importanza
    2.30
     significance
    2.23
     importanza
    2.13
     Bedeutung
    2.08
    Act Density 0.080%

    No Known Activations