INDEX
    Explanations

    occurrences of phrases that indicate agreements or conditions

    New Auto-Interp
    Negative Logits
    ule
    -0.07
    ú
    -0.06
    op
    -0.06
    ir
    -0.06
    cul
    -0.06
    ullan
    -0.06
    _PRIORITY
    -0.06
    YLE
    -0.05
    sect
    -0.05
    alam
    -0.05
    POSITIVE LOGITS
     any
    0.12
     anything
    0.10
    ä»»ä½ķ
    0.10
    à¹ĥà¸Ķ
    0.09
     everything
    0.09
     all
    0.08
     qualquer
    0.08
    ãģĻãģ¹ãģ¦
    0.08
    any
    0.08
    .any
    0.08
    Act Density 0.004%

    No Known Activations