INDEX
    Explanations

    references to attributes or characteristics

    New Auto-Interp
    Negative Logits
    oodle
    -0.21
    enerator
    -0.20
    ores
    -0.19
    oz
    -0.17
    ok
    -0.16
    oa
    -0.16
    encia
    -0.16
    tring
    -0.16
    ing
    -0.16
    oen
    -0.16
    POSITIVE LOGITS
    actions
    0.20
    onom
    0.20
    IBUTE
    0.19
    senal
    0.18
    idge
    0.18
    ract
    0.18
    avers
    0.17
    raction
    0.17
    onaut
    0.17
    IBUTES
    0.17
    Act Density 0.045%

    No Known Activations