INDEX
    Explanations

    the word "for" and variations of it within different contexts

    New Auto-Interp
    Negative Logits
     كومونز
    -0.70
     itſelf
    -0.61
     متحده
    -0.58
     scoperta
    -0.56
     pleaſure
    -0.56
    styleType
    -0.56
     houſe
    -0.55
     presentada
    -0.53
     yyl
    -0.53
     ſche
    -0.53
    POSITIVE LOGITS
    icoot
    0.65
    umpad
    0.56
    yses
    0.56
     demi
    0.54
    antren
    0.53
     noDo
    0.53
     larynx
    0.51
    sizeCache
    0.51
    openzeppelin
    0.51
     ل
    0.51
    Act Density 0.350%

    No Known Activations