INDEX
Explanations
words indicating illumination or clarity, and numeric quantification or measurement
words related to investigative reporting or uncovering hidden information
popular movie titles along with their box office earnings
New Auto-Interp
Negative Logits
referen
-0.50
[|
-0.39
redistributed
-0.39
Tokens
-0.39
$$$$
-0.39
ĪĴ
-0.38
idated
-0.37
equivalents
-0.37
Discussion
-0.37
Moroc
-0.36
POSITIVE LOGITS
NetMessage
0.50
largeDownload
0.43
DRAGON
0.43
terness
0.43
taboola
0.41
pin
0.39
conom
0.37
utions
0.37
othes
0.37
brew
0.36
Activations Density 7.644%