INDEX
Explanations
words ending in "ate", "ated", "ize", and "ized", as well as a couple of other related words.
New Auto-Interp
Negative Logits
"..\..\..\
-0.57
IMPORTED
-0.50
jstor
-0.49
"..\..\
-0.46
Old
-0.44
llary
-0.43
<<<<<<<<<<<<<<
-0.43
javas
-0.41
reactivex
-0.41
Brown
-0.41
POSITIVE LOGITS
themſelves
0.87
itſelf
0.87
myſelf
0.85
raiſ
0.81
fometimes
0.81
neceff
0.74
betweenstory
0.73
himſelf
0.72
Jefus
0.72
ſelves
0.71
Activations Density 2.512%