INDEX
    Explanations

    references to specific concepts or practices within different belief systems or religions

    New Auto-Interp
    Negative Logits
    *:
    -0.70
    !.
    -0.65
    !:
    -0.64
    +.
    -0.63
    ';
    -0.59
    .:
    -0.59
    :,
    -0.58
     although
    -0.57
    jri
    -0.53
    *.
    -0.53
    POSITIVE LOGITS
    pires
    0.79
    pired
    0.72
     differed
    0.48
     mattered
    0.48
    ihadi
    0.47
     might
    0.46
     FF
    0.45
     entails
    0.45
    Script
    0.44
     EVs
    0.44
    Act Density 0.725%

    No Known Activations