INDEX
    Explanations

    mentions of the word "family" or its variations

    New Auto-Interp
    Negative Logits
    avar
    -0.15
    iph
    -0.14
    cho
    -0.14
    638
    -0.14
    jah
    -0.14
     Yer
    -0.14
    apor
    -0.14
    843
    -0.14
    tie
    -0.14
     PN
    -0.14
    POSITIVE LOGITS
    æģĭ
    0.15
    šak
    0.15
    ynth
    0.14
    å·¨
    0.14
    dex
    0.14
    EDA
    0.14
    лоп
    0.14
    ilin
    0.14
    ë¡ł
    0.14
    polator
    0.14
    Act Density 0.001%

    No Known Activations