INDEX
    Explanations

    starts of titles or phrases

    New Auto-Interp
    Negative Logits
     новый
    -1.09
     новой
    -1.00
     nowy
    -0.98
     quarantined
    -0.94
     nový
    -0.94
     الخاص
    -0.94
     czerwony
    -0.94
     phê
    -0.93
    -0.93
    ového
    -0.93
    POSITIVE LOGITS
     Only
    1.36
     They
    1.23
     That
    1.05
     Those
    1.04
     Just
    1.02
    %',
    1.02
     Most
    0.99
     As
    0.98
     Both
    0.98
     Some
    0.97
    Act Density 0.050%

    No Known Activations