INDEX
Explanations
references to comparison with a specific entity in a positive context
New Auto-Interp
Negative Logits
isters
-0.84
onics
-0.76
acles
-0.72
ensibly
-0.70
staking
-0.69
ernels
-0.68
ethe
-0.67
igers
-0.65
undreds
-0.64
riages
-0.63
POSITIVE LOGITS
worldly
1.45
aspect
0.97
circumstance
0.96
conceivable
0.95
outlet
0.87
entity
0.86
imaginable
0.85
mammal
0.85
where
0.84
iator
0.84
Activations Density 0.036%