INDEX
Explanations
names and identifiers related to people, places, and items in lists or categorization contexts
New Auto-Interp
Negative Logits
aÄĩ
-0.14
ancel
-0.14
rival
-0.14
mage
-0.14
phia
-0.13
uffman
-0.13
burgh
-0.13
various
-0.13
esor
-0.12
ãģĿãģĨ
-0.12
POSITIVE LOGITS
excluded
0.17
only
0.16
only
0.16
ustum
0.15
overlaps
0.15
ONLY
0.15
pig
0.15
INCLUDED
0.15
ono
0.14
Pig
0.14
Activations Density 0.158%