INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     regex
    -0.71
     CPC
    -0.71
     incent
    -0.69
    EGIN
    -0.69
     ethical
    -0.67
     shoulders
    -0.66
    uliffe
    -0.65
     proverb
    -0.65
     prohibitions
    -0.65
     constitu
    -0.65
    POSITIVE LOGITS
    waukee
    0.83
    Ke
    0.80
    brew
    0.77
    cakes
    0.77
    her
    0.75
    quart
    0.75
    jen
    0.74
    ju
    0.73
    fig
    0.72
    STAR
    0.72
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.