INDEX
    Explanations

    phrases indicating purpose or intention related to an action

    New Auto-Interp
    Negative Logits
    ovna
    -0.17
    anlık
    -0.15
    cia
    -0.15
     dÃ¼ÅŁÃ¼r
    -0.15
    ãĤ¤ãĥĦ
    -0.14
    æľĹ
    -0.14
    ei
    -0.14
    .struts
    -0.14
    anford
    -0.14
    adan
    -0.14
    POSITIVE LOGITS
     justice
    0.32
     differently
    0.28
     Justice
    0.27
    justice
    0.27
     wrong
    0.26
    Justice
    0.24
     Wrong
    0.20
     backwards
    0.19
    wrong
    0.18
     WRONG
    0.18
    Act Density 0.040%

    No Known Activations