INDEX
    Explanations

    abstract nouns after "of"

    New Auto-Interp
    Negative Logits
    n
    0.84
    ن
    0.83
    ों
    0.74
    ع
    0.73
    ار
    0.71
     piedras
    0.71
    <0x99>
    0.69
    ných
    0.68
    the
    0.67
    0.66
    POSITIVE LOGITS
    ".
    0.57
    ",
    0.55
    .",
    0.53
    .}
    0.53
     summ
    0.53
    ."
    0.52
    ত্ব
    0.52
     oversight
    0.52
    DA
    0.50
     মধ্যে
    0.50
    Act Density 0.264%

    No Known Activations