INDEX
Explanations
words after decline, down, or illegal
New Auto-Interp
Negative Logits
carer
0.43
banknotes
0.42
bourgeoisie
0.40
impotence
0.38
diarrhoea
0.37
difficulties
0.37
например
0.36
strikingly
0.36
debtors
0.35
morphism
0.35
POSITIVE LOGITS
kiddos
0.62
はもちろん
0.57
हमारी
0.54
جميع
0.54
всех
0.53
當然
0.53
våra
0.53
tentunya
0.52
تمامی
0.52
всіх
0.52
Activations Density 0.090%