INDEX
    Explanations

    positive adjectives followed by nouns

    New Auto-Interp
    Negative Logits
    ini
    0.46
     Logos
    0.43
    ico
    0.39
     Locking
    0.38
    ida
    0.37
     Hyper
    0.37
    Hyper
    0.37
    ulagway
    0.36
    akkhati
    0.36
    ino
    0.36
    POSITIVE LOGITS
     puzzling
    0.45
     aead
    0.40
     domain
    0.39
     physic
    0.38
     даже
    0.38
     quantitative
    0.38
     veg
    0.38
     booze
    0.38
     ail
    0.37
     chieft
    0.37
    Act Density 0.000%

    No Known Activations