INDEX
Explanations
phrases indicating individual instances or quantities
references to individual entities or components within a broader context
New Auto-Interp
Negative Logits
osponsors
-0.72
imeters
-0.70
icut
-0.66
actionGroup
-0.65
ibility
-0.65
inity
-0.64
vered
-0.63
srf
-0.61
ategory
-0.61
exting
-0.61
POSITIVE LOGITS
hundred
0.88
esan
0.85
Hundred
0.81
of
0.80
thousand
0.77
else
0.70
;;;;;;;;
0.70
hots
0.68
million
0.65
usercontent
0.64
Activations Density 0.041%