INDEX
Explanations
newly introduced terms or concepts
mentions of the word "new."
New Auto-Interp
Negative Logits
ality
-0.67
bey
-0.64
nikov
-0.63
zzi
-0.61
/
-0.60
bys
-0.60
endez
-0.59
rior
-0.59
antry
-0.59
hunt
-0.58
POSITIVE LOGITS
new
3.15
new
2.32
newly
1.99
newest
1.86
newer
1.74
newfound
1.71
NEW
1.70
newcomers
1.55
NEW
1.48
fresh
1.46
Activations Density 0.102%