INDEX
Explanations
mentions of the word 'dim'
references to "dimorphism" and its variations
New Auto-Interp
Negative Logits
hips
-0.84
CRIP
-0.79
cially
-0.70
ciating
-0.66
EMENT
-0.63
ALLY
-0.61
omes
-0.60
spr
-0.59
Rouge
-0.58
Desk
-0.58
POSITIVE LOGITS
inished
1.61
itri
1.38
ethy
1.35
ming
1.28
ples
1.25
orph
1.24
ensions
1.21
pled
1.20
med
1.17
pling
1.13
Activations Density 0.046%