INDEX
Explanations
names of individuals, particularly those associated with various groups or events
New Auto-Interp
Negative Logits
antino
-0.21
ctor
-0.19
orex
-0.17
ctic
-0.16
Staples
-0.15
cter
-0.15
ekt
-0.14
ctors
-0.14
Katz
-0.14
ereotype
-0.14
POSITIVE LOGITS
ataire
0.15
uelle
0.13
imeline
0.13
elled
0.13
Bom
0.13
ength
0.13
Cox
0.13
ä½ľç͍
0.13
iddles
0.13
Cabr
0.12
Activations Density 0.075%