INDEX
Explanations
proper nouns or names with common words in between
references to brands or entities in the context of discussions about products or consumer behavior
New Auto-Interp
Negative Logits
ÂŃ
-0.86
Amtrak
-0.57
earchers
-0.53
Washington
-0.53
hower
-0.52
ifestyle
-0.52
fracking
-0.51
Washington
-0.51
orest
-0.51
vernment
-0.51
POSITIVE LOGITS
[+
0.73
canon
0.69
plagiar
0.64
Variant
0.63
english
0.61
deleted
0.61
deletion
0.60
verse
0.60
cipher
0.60
alot
0.59
Activations Density 2.263%