INDEX
Explanations
descriptive phrases indicating typical examples or characteristics of subjects
New Auto-Interp
Negative Logits
dieux
-0.52
pères
-0.49
religieuses
-0.49
verheir
-0.49
zewnętrzne
-0.48
bağlantılar
-0.47
lägen
-0.46
ButtonModule
-0.46
gröss
-0.46
électroniques
-0.45
POSITIVE LOGITS
typical
1.77
Typical
1.73
Typical
1.67
typical
1.66
typique
1.38
典型
1.34
типи
1.29
típico
1.26
TYP
1.23
típica
1.20
Activations Density 0.320%