INDEX
    Explanations

    references to faith or mentions of the concept of faith

    New Auto-Interp
    Negative Logits
     numel
    -0.14
    erton
    -0.14
    burgh
    -0.14
    pit
    -0.14
    elere
    -0.14
    ippy
    -0.14
    chop
    -0.14
    æĴ
    -0.14
    urgeon
    -0.13
    ืà¹ī
    -0.13
    POSITIVE LOGITS
     fa
    0.28
     FA
    0.22
     Fa
    0.22
    /fa
    0.20
    ust
    0.19
    ulk
    0.19
    аÑĢÑĮ
    0.18
    ifax
    0.17
    ience
    0.17
    Fa
    0.17
    Act Density 0.011%

    No Known Activations