INDEX
Explanations
numbers, specifically the number "ten" or close variations
instances of the word "ten" or references related to the number ten
New Auto-Interp
Negative Logits
aker
-0.73
jri
-0.70
ault
-0.70
asks
-0.68
Adds
-0.65
poke
-0.64
cation
-0.63
MQ
-0.63
Jackets
-0.63
ibles
-0.63
POSITIVE LOGITS
ten
3.11
fifteen
2.30
twenty
2.19
twelve
2.19
eleven
2.16
fourteen
2.10
thirty
2.08
thirteen
2.08
sixteen
2.05
fifty
2.02
Activations Density 0.011%