INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     speaker
    -0.67
     honour
    -0.66
     convict
    -0.66
     Wr
    -0.65
     Advocate
    -0.64
     Lump
    -0.63
     Ecc
    -0.62
    ï¸
    -0.62
     Ancient
    -0.62
     Sovere
    -0.61
    POSITIVE LOGITS
    ahime
    0.92
    igree
    0.89
    ensen
    0.86
    aneers
    0.82
    yrinth
    0.81
    erella
    0.80
    yahoo
    0.79
    merce
    0.79
    imi
    0.78
    anya
    0.78
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.