INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    umbn
    -0.81
     fortun
    -0.73
     Gale
    -0.69
     Gutenberg
    -0.66
     obscurity
    -0.65
     nomine
    -0.64
     Philipp
    -0.64
     funer
    -0.63
     Playboy
    -0.63
     Griffith
    -0.63
    POSITIVE LOGITS
    Train
    0.86
    ÃŁ
    0.84
    faced
    0.76
    ski
    0.75
    per
    0.74
    rate
    0.74
    Rate
    0.73
    orbit
    0.73
    hyp
    0.72
    rocket
    0.72
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.