INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     trở
    -0.07
    ưỡng
    -0.06
     produto
    -0.06
     Dawn
    -0.06
    968
    -0.06
     zev
    -0.06
     attendees
    -0.06
    -0.06
     novice
    -0.06
    POSITIVE LOGITS
     fil
    0.18
     Fil
    0.13
    Fil
    0.13
     Phil
    0.11
    fil
    0.11
     filament
    0.11
    Phil
    0.10
    .fil
    0.10
     filthy
    0.09
    _fil
    0.09
    Act Density 0.009%

    No Known Activations