INDEX
    Explanations

    conversations centered around friendship and relationships

    New Auto-Interp
    Negative Logits
    adolu
    -0.16
    _SUPPORT
    -0.16
    sah
    -0.15
    iyon
    -0.15
    meyi
    -0.14
     vyjád
    -0.14
    tir
    -0.14
    ilos
    -0.14
    dex
    -0.14
    ivet
    -0.14
    POSITIVE LOGITS
     tell
    0.47
    tell
    0.42
    åijĬè¯ī
    0.42
     telling
    0.41
     tells
    0.41
     Tell
    0.41
    Tell
    0.40
     told
    0.39
     Tells
    0.33
    .tell
    0.30
    Act Density 0.478%

    No Known Activations