INDEX
    Explanations

    phrases related to limitations, conditions, and clarifications in research studies or discussions

    New Auto-Interp
    Negative Logits
    Autoritní
    -1.00
    rungsseite
    -0.84
    DoubleQuotes
    -0.80
     ModelExpression
    -0.78
     autorytatywna
    -0.78
    SharedDtor
    -0.77
    ########.
    -0.76
     betweenstory
    -0.69
     otomatig
    -0.68
     Мексичка
    -0.67
    POSITIVE LOGITS
     sometimes
    1.23
    Sometimes
    1.02
    sometimes
    1.02
     Sometimes
    1.01
     some
    0.93
     kadang
    0.84
     parfois
    0.83
     soms
    0.81
    有时
    0.79
    有时候
    0.78
    Act Density 0.411%

    No Known Activations