INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     scrut
    -0.85
    verty
    -0.79
    IFA
    -0.76
    earch
    -0.73
    ionage
    -0.72
     Ethiop
    -0.72
     fundament
    -0.72
    FTWARE
    -0.69
     unlaw
    -0.67
    ILL
    -0.66
    POSITIVE LOGITS
    ga
    0.77
    nation
    0.72
    god
    0.70
    flo
    0.67
    reb
    0.65
     knock
    0.64
    atl
    0.64
    nam
    0.64
    gan
    0.63
    atra
    0.62
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.