INDEX
Explanations
occurrences of the word "New."
New Auto-Interp
Negative Logits
Ļ
-0.15
apo
-0.15
compat
-0.15
829
-0.15
ainer
-0.15
жив
-0.15
δÏģο
-0.14
oner
-0.14
orean
-0.14
ampo
-0.14
POSITIVE LOGITS
Zealand
0.31
York
0.26
Delhi
0.24
Orleans
0.22
castle
0.22
Scientist
0.22
ìļķ
0.21
chw
0.20
swire
0.20
york
0.19
Activations Density 0.049%