INDEX
Explanations
names of people or entities
proper nouns, specifically names and titles
New Auto-Interp
Negative Logits
proportion
-0.64
ãĥģ
-0.60
FedEx
-0.60
Thief
-0.60
Costco
-0.58
cartel
-0.56
quadru
-0.56
AAA
-0.56
CTR
-0.56
Tup
-0.56
POSITIVE LOGITS
enegger
1.08
ricks
0.90
jen
0.87
yk
0.83
kson
0.83
enson
0.80
rov
0.75
esson
0.73
recalled
0.73
sung
0.72
Activations Density 0.601%