INDEX
Explanations
colors mentioned in a document
instances of coordination in complex phrases or lists
New Auto-Interp
Negative Logits
Administ
-0.74
SHIP
-0.74
iets
-0.72
Report
-0.70
amily
-0.69
udence
-0.68
makers
-0.68
itant
-0.68
hesda
-0.67
Reviewer
-0.66
POSITIVE LOGITS
striped
1.12
yellow
1.11
grey
1.09
orange
1.08
purple
1.07
gray
1.06
blue
0.99
green
0.95
violet
0.95
yellow
0.92
Activations Density 0.150%