INDEX
    Explanations

    phrases indicating generalizations or qualifiers in statements

    New Auto-Interp
    Negative Logits
    uder
    -0.16
     Jou
    -0.15
    eries
    -0.15
    oure
    -0.14
    uve
    -0.14
    Escort
    -0.14
     ÙĪØ§ÙĦس
    -0.14
    utory
    -0.13
    ево
    -0.13
     Fres
    -0.13
    POSITIVE LOGITS
    contro
    0.15
    ambi
    0.15
    лага
    0.14
    istrovstvÃŃ
    0.14
    805
    0.14
    olen
    0.13
     as
    0.13
    Ïīμα
    0.13
     apart
    0.13
    contra
    0.13
    Act Density 0.034%

    No Known Activations