INDEX
    Explanations

    negations or expressions of disagreement

    New Auto-Interp
    Negative Logits
    geç
    -0.15
     Moreno
    -0.15
     Mature
    -0.15
    endra
    -0.14
     gel
    -0.14
     Deferred
    -0.14
    by
    -0.14
    376
    -0.13
    WARNING
    -0.13
     filmy
    -0.13
    POSITIVE LOGITS
    ãģŁãĤī
    0.16
    Bins
    0.15
    yg
    0.14
    tach
    0.14
    ££
    0.14
    tae
    0.14
    .lu
    0.13
    /xhtml
    0.13
    quier
    0.13
     Lust
    0.13
    Act Density 0.104%

    No Known Activations