INDEX
Explanations
descriptive modifiers followed by nouns
New Auto-Interp
Negative Logits
ಎಲ್ಲಾ
0.44
усіх
0.41
endearing
0.38
식으로
0.36
እና
0.35
фаразлары
0.35
всіх
0.35
всех
0.35
)$
0.34
mọi
0.34
POSITIVE LOGITS
e
0.42
ين
0.42
ka
0.40
Have
0.40
ي
0.39
6
0.37
2
0.36
5
0.36
7
0.35
Task
0.35
Activations Density 0.551%