INDEX
    Explanations

    phrases describing manners or methods of actions

    New Auto-Interp
    Negative Logits
    unar
    -0.15
    unate
    -0.15
    лов
    -0.15
    onga
    -0.14
    sic
    -0.14
    undy
    -0.14
    pheric
    -0.14
    管
    -0.14
    airs
    -0.13
    -ves
    -0.13
    POSITIVE LOGITS
     manner
    0.23
     fashion
    0.21
     thức
    0.20
    isms
    0.19
    ward
    0.18
    /place
    0.16
     Claw
    0.15
     way
    0.15
     ways
    0.14
     Tamb
    0.14
    Act Density 0.039%

    No Known Activations