INDEX
Explanations
phrases that indicate quantities, preferences, and personal references
New Auto-Interp
Negative Logits
pek
-0.15
leta
-0.14
bish
-0.14
IFICATIONS
-0.13
ipel
-0.13
ffen
-0.13
oner
-0.13
Âį
-0.13
isay
-0.13
xfb
-0.13
POSITIVE LOGITS
specific
1.03
specific
0.90
particular
0.83
Specific
0.82
Specific
0.82
especÃŃf
0.81
-specific
0.80
_specific
0.76
specifically
0.73
pecific
0.73
Activations Density 0.365%