INDEX
Explanations
questions or statements containing contractions
New Auto-Interp
Negative Logits
ortunately
-0.91
withd
-0.86
exha
-0.83
rall
-0.82
newcom
-0.78
anwhile
-0.78
eleph
-0.74
Þ
-0.71
exting
-0.71
confir
-0.70
POSITIVE LOGITS
't
1.67
ny
0.92
ovan
0.90
ada
0.87
ÃŃ
0.86
ned
0.78
athan
0.78
ALD
0.77
thia
0.77
na
0.77
Activations Density 0.055%