INDEX
    Explanations

    descriptors followed by nouns

    New Auto-Interp
    Negative Logits
     (
    2.40
    2.25
    。(
    2.23
    .(
    2.23
     (!
    2.21
    。(
    2.16
    !(
    2.14
     ($
    2.13
    !(
    2.11
     ([
    2.07
    POSITIVE LOGITS
    ?),
    1.68
    )+
    1.67
    )-
    1.65
    )/
    1.64
    %)
    1.64
     %)
    1.63
    %),
    1.62
    ?)
    1.59
    )
    1.58
    )・
    1.55
    Act Density 0.244%

    No Known Activations