INDEX
    Explanations

    themes related to emotional states and social interactions

    New Auto-Interp
    Negative Logits
     due
    -0.22
    due
    -0.21
    åĺĽ
    -0.18
     Due
    -0.17
    _due
    -0.17
     thanks
    -0.17
    uld
    -0.16
    Due
    -0.16
    olk
    -0.15
    thers
    -0.15
    POSITIVE LOGITS
     because
    0.41
    because
    0.38
     porque
    0.36
     Because
    0.36
    Because
    0.36
    ï¼ĮåĽłä¸º
    0.34
    åĽłä¸º
    0.33
     perché
    0.32
     поÑĤомÑĥ
    0.32
    ecause
    0.28
    Act Density 0.275%

    No Known Activations