INDEX
    Explanations

    instances of specific phrases or concepts related to influence and personal relationships

    New Auto-Interp
    Negative Logits
     civil
    -0.15
    ibe
    -0.15
    ÑĪки
    -0.15
     boil
    -0.14
    erk
    -0.14
    ÑĪка
    -0.14
     blame
    -0.14
     OK
    -0.14
    kos
    -0.14
    ts
    -0.14
    POSITIVE LOGITS
    endoza
    0.17
     Sesso
    0.16
    UAGE
    0.15
    rana
    0.14
    ÙĬÙĥا
    0.14
    ¼åIJĪ
    0.14
    lied
    0.14
    ãĤ¯ãĤ»
    0.14
    ourn
    0.13
    awe
    0.13
    Act Density 0.001%

    No Known Activations