INDEX
    Explanations

    proper nouns or named entities

    phrases related to naming examples or instances

    New Auto-Interp
    Negative Logits
    arnaev
    -0.80
    ogn
    -0.72
    earable
    -0.66
    fram
    -0.65
    ORGE
    -0.62
    iership
    -0.62
    urgy
    -0.61
    guided
    -0.60
    mes
    -0.58
    sup
    -0.58
    POSITIVE LOGITS
    instance
    0.63
     afar
    0.61
    onest
    0.60
    laughs
    0.58
    };
    0.57
     briefly
    0.57
     Clicker
    0.57
    ENCY
    0.55
    Arcade
    0.55
     approximation
    0.55
    Act Density 0.094%

    No Known Activations