INDEX
    Explanations

    nouns that signify significant actions or effects

    New Auto-Interp
    Negative Logits
     Tome
    -0.17
    ongan
    -0.16
    ilan
    -0.15
    -ÐĽ
    -0.14
    _lv
    -0.14
    RAP
    -0.14
    ruba
    -0.14
    .timeScale
    -0.14
    otos
    -0.14
    ë¦
    -0.14
    POSITIVE LOGITS
    ses
    0.17
    .sam
    0.16
    pling
    0.15
     ele
    0.15
    -fetch
    0.14
     Pitt
    0.14
    rott
    0.14
     kari
    0.14
     Fro
    0.14
     fro
    0.14
    Act Density 0.010%

    No Known Activations