INDEX
    Explanations

    adjectives following nouns

    New Auto-Interp
    Negative Logits
    0.59
     ре
    0.54
    iva
    0.54
     ﺍﻟ
    0.52
     Հ
    0.51
    ica
    0.50
    aning
    0.49
     குறிப்பிட்ட
    0.49
    amous
    0.49
    ohner
    0.49
    POSITIVE LOGITS
    رفه
    0.45
    λευ
    0.44
    stopPropagation
    0.42
    0.41
    0.40
    0.39
    0.38
    0.38
    上下
    0.38
    0.38
    Act Density 0.006%

    No Known Activations