INDEX
Explanations
names of specific individuals
names of prominent figures and organizations
New Auto-Interp
Negative Logits
awa
-0.61
Defin
-0.59
OPLE
-0.59
è¦ļéĨĴ
-0.58
curve
-0.58
©¶æ
-0.57
gradient
-0.57
Democr
-0.57
rainbow
-0.56
fog
-0.55
POSITIVE LOGITS
olver
0.67
etc
0.66
agen
0.63
)'
0.63
awan
0.63
guard
0.62
oshenko
0.61
rup
0.61
sat
0.60
avia
0.60
Activations Density 0.441%