INDEX
Explanations
contrasting or unexpected continuations
New Auto-Interp
Negative Logits
ель
0.88
раст
0.84
ليل
0.75
اردوش
0.75
ي
0.73
ి
0.72
confertim
0.68
лай
0.67
سم
0.66
چل
0.66
POSITIVE LOGITS
b
0.78
am
0.77
k
0.77
it
0.72
et
0.70
ate
0.70
_
0.70
ও
0.69
can
0.68
fte
0.68
Activations Density 0.004%