INDEX
    Explanations

    references to social awkwardness or uncomfortable situations

    New Auto-Interp
    Negative Logits
    ched
    -0.16
    hook
    -0.16
    éĽĦ
    -0.16
    çī
    -0.16
    ita
    -0.15
    allon
    -0.15
    leta
    -0.15
     Hook
    -0.15
    @student
    -0.15
    пеÑĩ
    -0.15
    POSITIVE LOGITS
    launcher
    0.15
    dialog
    0.14
    оÑĢон
    0.14
    sil
    0.14
     dialogue
    0.14
    afil
    0.14
    itol
    0.13
    ais
    0.13
     situations
    0.13
    ãĥįãĥ«
    0.13
    Act Density 0.030%

    No Known Activations