INDEX
Explanations
updates and news-related terms
references to updates or notifications about content
New Auto-Interp
Negative Logits
aden
-0.90
uana
-0.78
enic
-0.76
rongh
-0.74
orney
-0.73
nered
-0.72
otten
-0.71
bered
-0.71
fruit
-0.70
inh
-0.69
POSITIVE LOGITS
Update
1.12
Update
1.09
UPDATE
1.05
UPDATE
0.92
CLAIM
0.82
EDIT
0.82
Timeline
0.81
Deadline
0.81
Edit
0.80
update
0.78
Activations Density 0.012%