INDEX
    Explanations

    expressions related to social interactions and relationships

    New Auto-Interp
    Negative Logits
     ...
    -0.35
     "
    -0.28
     â̦
    -0.27
    ...
    -0.24
    -0.23
    -↵
    -0.23
     -↵
    -0.22
     -
    -0.22
     [
    -0.21
    -0.21
    POSITIVE LOGITS
    .Companion
    0.15
     («
    0.14
    ãĢįãĢĮ
    0.14
    _marshall
    0.14
    uitka
    0.14
     `%
    0.14
    (crate
    0.13
     putas
    0.13
    伸
    0.13
    ".$_
    0.13
    Act Density 0.028%

    No Known Activations