INDEX
Explanations
words related to specific entities such as names of people, places, or organizations
names of people or organizations associated with leadership roles
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨ
-0.62
referen
-0.57
ãĥ¼ãĥĨãĤ£
-0.55
[|
-0.52
ãĥĩãĤ£
-0.51
$$$$
-0.49
代
-0.48
thous
-0.47
éĽ
-0.46
lished
-0.46
POSITIVE LOGITS
mel
0.51
NetMessage
0.51
amba
0.49
grim
0.49
gall
0.48
ANS
0.46
han
0.46
olin
0.45
âĢº
0.45
agy
0.45
Activations Density 2.327%