INDEX
Explanations
dates or time references
the definite article "the" in various contexts
New Auto-Interp
Negative Logits
ageddon
-0.72
geist
-0.71
rage
-0.66
ceive
-0.65
eno
-0.65
thood
-0.64
nut
-0.63
iffe
-0.62
these
-0.60
iod
-0.60
POSITIVE LOGITS
latter
1.15
largest
1.09
longest
1.02
smallest
1.01
biggest
1.01
latest
0.98
heaviest
0.97
oldest
0.97
same
0.95
earliest
0.94
Activations Density 0.133%