INDEX
Explanations
proper names or acronyms
names or identifiers of individuals and entities
New Auto-Interp
Negative Logits
Pen
-0.91
pen
-0.89
237
-0.80
Radar
-0.80
Noir
-0.78
Siren
-0.77
Rox
-0.76
RN
-0.75
FF
-0.74
pens
-0.74
POSITIVE LOGITS
AND
1.25
ands
1.17
ande
1.16
and
1.12
anders
1.11
older
1.04
ander
1.02
anding
0.99
anda
0.98
anded
0.98
Activations Density 0.327%