INDEX
Explanations
phrases related to superlatives or rankings
references to "the world" and its various aspects or distinctions
New Auto-Interp
Negative Logits
xual
-0.85
ality
-0.84
nels
-0.83
rade
-0.76
irement
-0.76
VP
-0.72
zes
-0.71
lp
-0.71
CHAT
-0.70
wal
-0.70
POSITIVE LOGITS
largest
1.18
richest
1.07
wealthiest
1.04
poorest
1.02
fastest
1.02
tallest
1.00
finest
0.99
deadliest
0.98
toughest
0.96
busiest
0.95
Activations Density 0.066%