INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	cmd
    -0.07
     dic
    -0.07
     familia
    -0.06
    ني
    -0.06
     kendini
    -0.06
     dim
    -0.06
    	bool
    -0.06
    JI
    -0.06
     канди
    -0.06
     Choi
    -0.06
    POSITIVE LOGITS
     over
    0.25
     Over
    0.23
    Over
    0.20
     OVER
    0.19
    over
    0.18
    -over
    0.17
    _over
    0.16
    OVER
    0.16
     across
    0.12
    _OVER
    0.12
    Act Density 0.090%

    No Known Activations