INDEX
Explanations
references to specific entities or proper nouns, particularly related to military or news
uppercase letters or titles that indicate significant subject matter
New Auto-Interp
Negative Logits
Noir
-0.82
CPC
-0.76
Carnival
-0.65
Solitaire
-0.65
bottleneck
-0.64
Pax
-0.62
sinks
-0.62
Scarlet
-0.62
Codex
-0.61
CDs
-0.61
POSITIVE LOGITS
prising
1.33
PDATED
1.24
nexpected
1.24
seless
1.23
pperc
1.07
CLA
0.99
gly
0.98
mpire
0.97
ntil
0.97
prise
0.96
Activations Density 0.056%