INDEX
Explanations
historical and conversational language
New Auto-Interp
Negative Logits
ු
0.49
чыць
0.47
czas
0.45
mandates
0.43
ور
0.42
чей
0.42
قي
0.41
عند
0.41
tells
0.41
convo
0.41
POSITIVE LOGITS
अत्यधिक
0.48
Forms
0.46
váy
0.44
Pleistocene
0.43
GEN
0.43
री
0.42
contrib
0.42
spontaneously
0.42
अच्छी
0.42
Proven
0.41
Activations Density 0.003%