INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    çĽ
    -0.66
     Kag
    -0.65
    ova
    -0.65
    DEP
    -0.63
    LESS
    -0.62
    Bir
    -0.62
    etting
    -0.62
    Cla
    -0.62
     Lauder
    -0.61
     Fla
    -0.61
    POSITIVE LOGITS
    udos
    0.81
    soever
    0.77
    yip
    0.77
    isine
    0.76
    anoia
    0.73
    gins
    0.72
    zie
    0.72
    tain
    0.70
    alan
    0.70
    rency
    0.69
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.