INDEX
Explanations
numbers and numerical expressions
New Auto-Interp
Negative Logits
0
-0.99
es
-0.81
2
-0.78
3
-0.74
5
-0.73
4
-0.73
X
-0.71
9
-0.70
6
-0.68
os
-0.65
POSITIVE LOGITS
ſeveral
1.40
itſelf
1.36
themſelves
1.36
purpoſe
1.34
ſelves
1.34
ſmall
1.32
ſelf
1.32
himſelf
1.32
againſt
1.30
reaſon
1.30
Activations Density 0.092%