INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ation
    -0.79
    son
    -0.72
    Portale
    -0.65
     Mans
    -0.63
     patri
    -0.59
    istic
    -0.59
    alities
    -0.57
     Lex
    -0.57
     cost
    -0.57
     המח
    -0.56
    POSITIVE LOGITS
    >
    1.77
    ]>
    1.57
    }>
    1.54
     ?>">
    1.50
    }}>
    1.46
    )}>
    1.40
    ">
    1.37
    )>
    1.35
    >
    
    1.34
    '>
    1.29
    Act Density 0.153%

    No Known Activations