INDEX
Explanations
proper names or nouns representing individuals
names of individuals, particularly in professional or notable contexts
New Auto-Interp
Negative Logits
advertisement
-0.77
conservancy
-0.73
¥µ
-0.72
ombat
-0.71
actionDate
-0.71
duration
-0.69
peed
-0.68
qqa
-0.68
category
-0.67
fal
-0.67
POSITIVE LOGITS
's
0.94
Doe
0.90
remembers
0.89
enjoys
0.88
was
0.87
knows
0.86
recalls
0.85
Sr
0.85
wears
0.82
prefers
0.82
Activations Density 0.232%