INDEX
    Explanations

    phrases that indicate direction or movement towards a specific goal or concept

    New Auto-Interp
    Negative Logits
    íģ¼
    -0.15
    /w
    -0.15
    orca
    -0.14
    /her
    -0.14
    culate
    -0.14
    aac
    -0.14
    oretical
    -0.14
    kır
    -0.13
     <<-
    -0.13
    ctime
    -0.13
    POSITIVE LOGITS
    /from
    0.23
    /about
    0.22
    gether
    0.19
    wards
    0.19
     whom
    0.18
    GGLE
    0.18
     towards
    0.18
    ness
    0.17
    sWith
    0.17
     Towards
    0.17
    Act Density 0.024%

    No Known Activations