INDEX
Explanations
proper nouns, specifically names
references to specific individuals and companies
New Auto-Interp
Negative Logits
ered
-0.79
overc
-0.77
aches
-0.75
sear
-0.73
cyclopedia
-0.71
grilled
-0.70
esy
-0.68
franc
-0.68
eer
-0.68
uration
-0.67
POSITIVE LOGITS
DeL
0.80
ajor
0.76
asar
0.75
Lot
0.74
é¾įåĸļ士
0.72
Buchanan
0.70
Bee
0.68
TPPStreamerBot
0.67
itism
0.67
Quin
0.66
Activations Density 0.034%