INDEX
    Explanations

    names and familial relationships

    New Auto-Interp
    Negative Logits
    cola
    -0.15
     blindly
    -0.14
    tak
    -0.14
    _DIS
    -0.14
    lex
    -0.14
     neutral
    -0.14
    dr
    -0.13
    zu
    -0.13
    834
    -0.13
     follower
    -0.13
    POSITIVE LOGITS
    ajaran
    0.16
    ÏĦιν
    0.15
    redi
    0.15
    ưỡng
    0.15
    AFE
    0.14
    lÃŃ
    0.14
    à¥Ģय
    0.14
    urahan
    0.14
     ÑĤв
    0.14
    ÑĩÑı
    0.14
    Act Density 0.112%

    No Known Activations