INDEX
    Explanations

    alternative approaches to problem-solving

    New Auto-Interp
    Negative Logits
    usto
    -0.17
    ovna
    -0.15
    POSIT
    -0.15
     trunk
    -0.15
    /Area
    -0.14
    #
    -0.14
    íĨłíĨł
    -0.14
    cia
    -0.14
    EO
    -0.14
    ei
    -0.14
    POSITIVE LOGITS
     justice
    0.28
     Justice
    0.24
    justice
    0.23
    Justice
    0.21
     wrong
    0.21
     differently
    0.18
    -done
    0.18
     backwards
    0.17
     proud
    0.16
     estilo
    0.16
    Act Density 0.075%

    No Known Activations