INDEX
Explanations
words related to specific organizations or entities
New Auto-Interp
Negative Logits
ienced
-0.82
loo
-0.81
neys
-0.69
rosse
-0.68
iences
-0.67
tons
-0.67
lington
-0.66
inson
-0.64
ingham
-0.62
Mata
-0.62
POSITIVE LOGITS
ACP
1.12
UFC
1.01
ZI
1.01
emonic
1.00
iversal
0.92
STAR
0.92
umeric
0.91
IS
0.89
guyen
0.88
FU
0.88
Activations Density 0.068%