INDEX
Explanations
names of individuals
names of people and organizations
New Auto-Interp
Negative Logits
annis
-0.72
Tao
-0.72
chens
-0.68
race
-0.67
Commando
-0.65
nda
-0.63
ao
-0.62
Irish
-0.61
agi
-0.61
pora
-0.61
POSITIVE LOGITS
ruary
0.87
enthal
0.81
esse
0.77
idays
0.73
oult
0.72
istic
0.70
theless
0.69
enberg
0.67
iflower
0.65
ickers
0.65
Activations Density 0.032%