INDEX
    Explanations

    phrases with following words

    New Auto-Interp
    Negative Logits
    (
    0.48
    ^_^
    0.47
    sounds
    0.46
    0.44
    гах
    0.43
    гация
    0.43
     más
    0.41
    nahmen
    0.41
     Capcom
    0.41
    TOR
    0.41
    POSITIVE LOGITS
    0.52
     sozinho
    0.51
    0.50
    0.48
    0.47
     unfolded
    0.46
    0.46
     alone
    0.46
    ject
    0.45
     Ș
    0.45
    Act Density 0.001%

    No Known Activations