INDEX
Explanations
the letter 's' at the end of words
the letter "s."
New Auto-Interp
Negative Logits
eur
-0.72
fatalities
-0.63
scares
-0.62
CVE
-0.62
headaches
-0.61
cair
-0.59
looms
-0.57
extermination
-0.57
mass
-0.56
Allied
-0.56
POSITIVE LOGITS
nesday
1.02
ername
0.81
forth
0.78
abi
0.72
terday
0.72
omething
0.72
aper
0.72
!--
0.71
lightly
0.69
stairs
0.68
Activations Density 0.026%