INDEX
Explanations
mentions of newness or innovation
New Auto-Interp
Negative Logits
other
-0.15
pole
-0.14
Ùĩ
-0.14
asks
-0.14
wise
-0.14
à¥Ģय
-0.14
nia
-0.13
ogl
-0.13
/h
-0.13
/video
-0.13
POSITIVE LOGITS
swire
0.26
-found
0.22
bies
0.21
sworth
0.20
ish
0.19
foundland
0.19
Zealand
0.18
/new
0.18
letters
0.18
é²ľ
0.18
Activations Density 0.139%