INDEX
    Explanations

    patterns in phrases and words

    New Auto-Interp
    Negative Logits
     Relief
    0.44
     downward
    0.39
     Phill
    0.38
     hali
    0.37
    Click
    0.37
    φ
    0.36
     Blond
    0.36
     Φ
    0.35
     downwards
    0.35
     SOS
    0.34
    POSITIVE LOGITS
     disadvant
    0.44
     metadata
    0.41
     издания
    0.41
     ประจํา
    0.40
    порта
    0.40
     razvoj
    0.40
    мый
    0.40
    ваш
    0.40
    unki
    0.40
     epidemi
    0.39
    Act Density 0.001%

    No Known Activations