INDEX
Explanations
instances of the word "plus" accompanied by specific numerical values
instances of the word "plus" and its variations, indicating an additive or cumulative context
New Auto-Interp
Negative Logits
dfx
-0.81
ruary
-0.75
adr
-0.71
ugu
-0.71
OCK
-0.69
utable
-0.67
ollar
-0.66
clinton
-0.66
abies
-0.65
NPR
-0.65
POSITIVE LOGITS
cules
1.02
minus
0.88
bonuses
0.75
bonus
0.74
henko
0.74
/-
0.73
infinity
0.72
extras
0.69
lihood
0.68
assorted
0.65
Activations Density 0.033%