INDEX
Explanations
verbs followed by 'and' or comma
New Auto-Interp
Negative Logits
heritage
0.45
hanging
0.43
itp
0.42
Մ
0.42
uniquement
0.40
plays
0.39
也可以
0.39
संबंधी
0.38
或其他
0.38
march
0.38
POSITIVE LOGITS
आणि
1.52
અને
1.48
and
1.45
এবং
1.44
और
1.42
và
1.42
และ
1.40
ਅਤੇ
1.40
και
1.38
и
1.34
Activations Density 0.061%