INDEX
Explanations
specific company or organization names
names of organizations and companies
New Auto-Interp
Negative Logits
istics
-0.79
izations
-0.78
anza
-0.75
ruction
-0.72
ization
-0.71
eatures
-0.70
culosis
-0.67
orically
-0.66
Ñĭ
-0.65
eering
-0.65
POSITIVE LOGITS
ilon
0.85
tera
0.80
boro
0.77
acet
0.76
mos
0.76
hire
0.75
ulic
0.74
gpu
0.74
borough
0.74
ulla
0.73
Activations Density 0.032%