INDEX
    Explanations

    phrases indicating the presence of examples or lists

    New Auto-Interp
    Negative Logits
    icont
    -0.14
    ần
    -0.14
    aviest
    -0.14
    ="__
    -0.14
    yx
    -0.13
    ä»ķ
    -0.13
    å¼ķãģį
    -0.13
    ختÙĩ
    -0.13
    à¥įमà¤ļ
    -0.13
     tranh
    -0.12
    POSITIVE LOGITS
     examples
    0.40
     example
    0.37
     sample
    0.35
     some
    0.32
    examples
    0.32
    Examples
    0.32
     Examples
    0.31
     exemp
    0.31
     samples
    0.31
     Sample
    0.29
    Act Density 0.110%

    No Known Activations