INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Efq
    -0.73
     pronouns
    -0.73
     myſelf
    -0.68
     ſche
    -0.63
     ―――――
    -0.61
    __':
    
    -0.60
    المناصب
    -0.59
     referenties
    -0.59
     ſind
    -0.59
     polygons
    -0.58
    POSITIVE LOGITS
     similar
    0.71
    <eos>
    0.69
     his
    0.61
     comparable
    0.60
     its
    0.57
     that
    0.57
    similar
    0.57
     Similar
    0.56
     Infórmanos
    0.56
    ↵↵
    0.55
    Act Density 0.003%

    No Known Activations