INDEX
    Explanations

    various expressions of attitudes, particularly negative and hostile ones

    New Auto-Interp
    Negative Logits
    orman
    -0.17
    ekim
    -0.16
    Callable
    -0.15
    ijo
    -0.14
    PN
    -0.14
    onom
    -0.14
    à¸ĩาà¸Ļ
    -0.14
    _prim
    -0.14
    lico
    -0.13
     Pant
    -0.13
    POSITIVE LOGITS
     towards
    0.81
     toward
    0.77
     Towards
    0.65
    Towards
    0.59
     hacia
    0.54
     Tow
    0.54
    owards
    0.43
    åIJij
    0.41
    oward
    0.40
     verso
    0.40
    Act Density 0.203%

    No Known Activations