INDEX
    Explanations

    phrases that indicate privacy protection and data handling

    New Auto-Interp
    Negative Logits
    aptop
    -0.16
    issing
    -0.16
    lus
    -0.16
     almost
    -0.14
    uzey
    -0.14
    osite
    -0.14
     bastante
    -0.14
    innie
    -0.14
    аж
    -0.14
     sometimes
    -0.14
    POSITIVE LOGITS
     nor
    0.38
    nor
    0.33
     EVER
    0.25
     Nor
    0.24
    Nor
    0.23
     NOR
    0.22
     ever
    0.21
     knowingly
    0.19
    ä¹Łä¸į
    0.18
    -ever
    0.17
    Act Density 0.251%

    No Known Activations