INDEX
Explanations
names of individuals or locations
New Auto-Interp
Negative Logits
ivity
-0.98
istics
-0.87
ually
-0.83
iants
-0.83
iveness
-0.81
istically
-0.81
inant
-0.78
iant
-0.78
opher
-0.77
oted
-0.76
POSITIVE LOGITS
lic
0.92
cles
0.82
ursed
0.80
lys
0.73
odies
0.72
rito
0.70
mble
0.67
rette
0.67
anners
0.66
ONSORED
0.66
Activations Density 0.076%