INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WATCHED
    -0.65
    audi
    -0.62
    igious
    -0.61
     Powers
    -0.60
    vernment
    -0.60
     neighbouring
    -0.60
     Lomb
    -0.58
    nesota
    -0.57
    tested
    -0.57
    unknown
    -0.57
    POSITIVE LOGITS
    cakes
    1.21
     cake
    1.18
    cake
    1.14
     cakes
    1.03
    ecake
    0.93
    meal
    0.93
     Cake
    0.88
    fruit
    0.82
    pillar
    0.79
    xual
    0.75
    Act Density 0.010%

    No Known Activations