INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     요
    -0.78
     recovery
    -0.75
    Compre
    -0.74
    Dobrý
    -0.72
     Werbe
    -0.72
     Recovery
    -0.72
     recupero
    -0.70
    -0.70
    ロゴ
    -0.69
    Trat
    -0.68
    POSITIVE LOGITS
     fuzz
    0.96
     Closure
    0.89
     Fuzz
    0.88
     closure
    0.80
    Fuzz
    0.79
    fuzz
    0.77
    Fuzzy
    0.76
     ווע
    0.75
    rboles
    0.73
    Gn
    0.73
    Act Density 0.020%

    No Known Activations