INDEX
Explanations
states of being or relations
New Auto-Interp
Negative Logits
Cómo
0.46
الذين
0.43
cómo
0.43
неуда
0.42
অনেক
0.41
аспек
0.41
কীভাবে
0.40
отдельных
0.40
하지만
0.39
ங்களுடன்
0.39
POSITIVE LOGITS
contains
0.89
consists
0.83
bukanlah
0.76
originates
0.75
belongs
0.75
lacks
0.75
bevat
0.75
corresponds
0.73
includes
0.72
has
0.70
Activations Density 0.012%