INDEX
Explanations
mentions of something new or recently discovered
the word "newly" and its variations
New Auto-Interp
Negative Logits
ĺħ
-0.75
orem
-0.75
ocene
-0.74
raints
-0.73
rice
-0.72
sqor
-0.72
retty
-0.69
Citation
-0.68
ashtra
-0.68
igraph
-0.68
POSITIVE LOGITS
wed
1.01
bie
0.90
appointed
0.89
ãĤ»
0.82
liberated
0.82
foundland
0.80
cedented
0.79
acquired
0.79
mint
0.78
dubbed
0.78
Activations Density 0.016%