INDEX
    Explanations

    words related to dates and people, often in a context involving events or actions

    New Auto-Interp
    Negative Logits
    s
    -0.95
     springfox
    -0.68
     Kob
    -0.68
     Wikimédia
    -0.68
    യും
    -0.66
    色んな
    -0.66
    はコチラ
    -0.64
    xic
    -0.63
    Filmo
    -0.62
     tartalomajánló
    -0.61
    POSITIVE LOGITS
    1.27
    1.06
    1.02
    1.02
    1.01
    0.99
    0.94
    0.92
    0.91
    0.91
    Act Density 0.092%

    No Known Activations