INDEX
Explanations
instances of the word 'dollar'
references to various monetary values
New Auto-Interp
Negative Logits
Prol
-0.72
lov
-0.67
etics
-0.66
Sakuya
-0.66
Detect
-0.63
Dev
-0.62
dev
-0.60
Ober
-0.60
Guard
-0.60
Mess
-0.58
POSITIVE LOGITS
dollar
3.94
Dollar
2.79
dollar
2.64
dollars
2.35
Dollars
1.98
dime
1.74
ollar
1.69
penny
1.64
pound
1.59
yen
1.49
Activations Density 0.007%