INDEX
Explanations
prepositions followed by nouns
New Auto-Interp
Negative Logits
लं
0.44
est
0.42
alam
0.41
género
0.40
боре
0.40
lage
0.37
estés
0.37
стой
0.37
lun
0.36
ruž
0.35
POSITIVE LOGITS
самого
0.43
року
0.42
середи
0.41
свого
0.41
стороны
0.40
skepticism
0.40
strany
0.40
സിയ
0.39
ơn
0.39
대로
0.38
Activations Density 0.002%