INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    存于互联网档案馆
    -0.53
     fubject
    -0.48
     pleaſure
    -0.45
    niad
    -0.44
    <bos>
    -0.44
    COMMENT
    -0.44
     houſe
    -0.43
     ſtate
    -0.43
    Hobby
    -0.43
    Preference
    -0.42
    POSITIVE LOGITS
    s
    0.76
    ulates
    0.66
    lishes
    0.66
    enters
    0.65
    ixes
    0.63
    ontes
    0.63
    tifies
    0.63
    loses
    0.63
    lizes
    0.63
    rens
    0.62
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.