INDEX
Explanations
instances of the word "Four"
references to the term "Four"
New Auto-Interp
Negative Logits
rad
-0.66
utm
-0.62
dmg
-0.61
spam
-0.60
ILA
-0.59
etting
-0.58
confuse
-0.58
erv
-0.57
answ
-0.56
err
-0.56
POSITIVE LOGITS
Four
3.45
Four
2.36
Six
2.28
Eight
2.19
Five
2.16
Three
2.14
Fif
2.09
Seven
1.89
Forty
1.84
Twelve
1.82
Activations Density 0.008%