INDEX
    Explanations

    the beginning of segments in text, indicated by the token '<bos>'

    New Auto-Interp
    Negative Logits
    Vidite
    -1.27
     Италијани
    -1.02
     Wikimedijinoj
    -0.89
    featureID
    -0.87
     tartalomajánló
    -0.83
     himo
    -0.83
    RTEE
    -0.83
    ImageContext
    -0.83
    Geplaatst
    -0.81
    Portale
    -0.80
    POSITIVE LOGITS
    2
    0.65
    1
    0.56
    3
    0.51
    </h2>
    0.51
    0
    0.49
    4
    0.47
    5
    0.47
    7
    0.45
     comfort
    0.45
    8
    0.44
    Act Density 0.000%

    No Known Activations