INDEX
Explanations
specific phrases or terms related to political or social events
phrases that emphasize prominence or significance in various contexts
New Auto-Interp
Negative Logits
udence
-0.70
rade
-0.70
zai
-0.69
iety
-0.69
icho
-0.69
iterator
-0.68
cohol
-0.67
han
-0.65
gae
-0.65
lich
-0.63
POSITIVE LOGITS
brunt
1.39
lion
1.35
reins
1.28
mantle
1.24
blame
1.23
spo
1.17
bulk
1.16
spotlight
1.12
majority
1.07
slack
1.06
Activations Density 0.209%