INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ulner
    -0.65
     fog
    -0.63
     sidx
    -0.62
     decomp
    -0.62
    Integ
    -0.60
     eclipse
    -0.60
     numb
    -0.60
     rust
    -0.59
     Airbnb
    -0.59
     collectively
    -0.59
    POSITIVE LOGITS
    kered
    0.94
    olic
    0.89
    awed
    0.76
    shaw
    0.74
    iliary
    0.70
    ocrats
    0.69
    reditary
    0.69
    elfare
    0.69
    het
    0.68
    zag
    0.68
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.