INDEX
Explanations
references to popularity or prevalence in a social context
numbers, quantities, prices
New Auto-Interp
Negative Logits
Tiefen
-0.38
Schicks
-0.38
Helden
-0.37
ٔ
-0.35
ahor
-0.33
aldea
-0.33
historically
-0.32
Auss
-0.32
↵
-0.31
Ruhe
-0.31
POSITIVE LOGITS
שוליים
0.81
queſta
0.79
rungsseite
0.73
<unused41>
0.69
<unused14>
0.68
<unused8>
0.68
<unused43>
0.68
<unused47>
0.68
[@BOS@]
0.68
<unused3>
0.68
Activations Density 0.001%