INDEX
    Explanations

    instances of deception or misrepresentation

    New Auto-Interp
    Negative Logits
     circuits
    -0.15
    oren
    -0.15
    _OBJ
    -0.14
    celain
    -0.14
    zza
    -0.14
     Deutsche
    -0.14
    onne
    -0.14
     cases
    -0.14
    bs
    -0.14
     extr
    -0.14
    POSITIVE LOGITS
    aul
    0.16
     miêu
    0.16
    Ấ
    0.15
     Cabin
    0.15
    άκ
    0.15
    лиÑĪ
    0.15
    acho
    0.15
     cellForRowAt
    0.14
    _representation
    0.14
    lea
    0.14
    Act Density 0.003%

    No Known Activations