INDEX
Explanations
adjectives expressing extreme positivity or negativity
terms related to scale, particularly large or significant concepts
New Auto-Interp
Negative Logits
tops
-0.77
THER
-0.64
GH
-0.63
ãĥĹ
-0.63
PLA
-0.62
ighth
-0.61
ãĥīãĥ©ãĤ´ãĥ³
-0.61
ij士
-0.61
ãĥĻ
-0.61
asio
-0.60
POSITIVE LOGITS
too
0.73
lest
0.63
iated
0.62
bait
0.62
cooks
0.61
lately
0.60
aban
0.59
compliment
0.58
amus
0.57
sighted
0.57
Activations Density 0.105%