INDEX
    Explanations

    verbs related to actions in research or experimental contexts

    New Auto-Interp
    Negative Logits
     rör
    -0.69
     houſe
    -0.67
     ſmall
    -0.66
     ModelExpression
    -0.66
     getIs
    -0.66
     Houſe
    -0.63
     nahilalakip
    -0.61
     متعلقه
    -0.61
     miſ
    -0.61
     cauſe
    -0.60
    POSITIVE LOGITS
     in
    0.86
     via
    0.72
     by
    0.71
     with
    0.70
     on
    0.69
     at
    0.69
     through
    0.66
     between
    0.60
     as
    0.59
     only
    0.58
    Act Density 0.739%

    No Known Activations