INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     envy
    -0.67
    arez
    -0.64
    ropy
    -0.64
    elvet
    -0.64
     arrogance
    -0.63
     compliments
    -0.62
     obesity
    -0.62
     frontrunner
    -0.61
    pport
    -0.61
     Deputy
    -0.61
    POSITIVE LOGITS
    ten
    0.85
    adj
    0.81
    sold
    0.79
    techn
    0.78
    location
    0.74
    technical
    0.73
    format
    0.73
    portion
    0.72
    bos
    0.69
    angan
    0.69
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.