INDEX
    Explanations

    phrases encouraging communication and outreach

    New Auto-Interp
    Negative Logits
    FSIZE
    -0.17
    emmel
    -0.15
    allon
    -0.14
    λή
    -0.14
    enheim
    -0.14
    庫
    -0.14
    rema
    -0.14
    ocht
    -0.14
    _DF
    -0.14
    íĿ
    -0.14
    POSITIVE LOGITS
     free
    0.55
    free
    0.40
    -free
    0.35
    _free
    0.35
    	free
    0.33
     Ñģвобод
    0.33
    Free
    0.32
     Free
    0.32
     FREE
    0.31
     welcome
    0.30
    Act Density 0.015%

    No Known Activations