INDEX
Explanations
definite articles 'the' that indicate importance or specificity
instances of repetition in phrases or ideas
New Auto-Interp
Negative Logits
Ò
-0.80
imi
-0.76
ceive
-0.76
thood
-0.75
arate
-0.71
1200
-0.70
acho
-0.69
aba
-0.69
bg
-0.69
antes
-0.68
POSITIVE LOGITS
biggest
1.25
vast
1.23
latter
1.22
simplest
1.22
slightest
1.21
easiest
1.15
oret
1.15
majority
1.14
quickest
1.12
greatest
1.11
Activations Density 0.587%