INDEX
Explanations
the word "and" in various contexts
New Auto-Interp
Negative Logits
SHIP
-0.83
udence
-0.79
amily
-0.76
hesda
-0.74
odcast
-0.73
Administ
-0.72
Deal
-0.71
Report
-0.71
Reviewer
-0.71
Administ
-0.71
POSITIVE LOGITS
orange
1.13
yellow
1.13
striped
1.08
purple
1.06
grey
1.06
stripes
1.04
blue
1.03
gray
1.01
green
0.96
brown
0.95
Activations Density 0.037%