INDEX
Explanations
proper nouns such as names and locations
mentions of individuals and their achievements or milestones
New Auto-Interp
Negative Logits
hov
-0.71
[];
-0.63
rect
-0.62
ategory
-0.62
Strauss
-0.62
gom
-0.60
abetic
-0.60
Metatron
-0.59
cled
-0.58
rought
-0.58
POSITIVE LOGITS
own
1.18
debut
0.87
OWN
0.77
impression
0.77
selves
0.76
contribution
0.75
allowance
0.75
displeasure
0.74
self
0.72
foothold
0.72
Activations Density 0.058%