INDEX
Explanations
political or controversial terms and phrases
patterns related to possessive forms and contractions
New Auto-Interp
Negative Logits
disadvant
-0.88
mathemat
-0.78
optimizations
-0.73
princ
-0.72
regul
-0.71
carbohyd
-0.70
pharmacy
-0.69
unborn
-0.68
pyramid
-0.68
fortun
-0.66
POSITIVE LOGITS
ï¸ı
1.05
İ
0.85
ï¸
0.81
cffffcc
0.76
TPS
0.76
PF
0.75
VICE
0.74
fter
0.72
SHA
0.72
HQ
0.71
Activations Density 0.328%