INDEX
    Explanations

    the occurrence of specific named entities or proper nouns

    New Auto-Interp
    Negative Logits
    .~(\
    -0.56
     Xunit
    -0.55
    ictured
    -0.55
    rovna
    -0.54
     belong
    -0.54
     않습니다
    -0.53
    belongs
    -0.52
    kowitz
    -0.52
    -0.51
    zustellen
    -0.51
    POSITIVE LOGITS
     Italijani
    0.68
     članak
    0.68
    ftagPool
    0.67
    AutoModerator
    0.65
    contentLoaded
    0.61
    CloseOperation
    0.61
     дописавши
    0.60
     lenker
    0.59
     دیکھیے
    0.59
    ConstraintMaker
    0.58
    Act Density 0.191%

    No Known Activations