INDEX
Explanations
numerical expressions like fractions and percentages
references to numerical data or metrics
New Auto-Interp
Negative Logits
natureconservancy
-0.67
PUBLIC
-0.61
Anthropology
-0.57
familiar
-0.56
Wilde
-0.56
Artemis
-0.56
Akin
-0.56
grave
-0.55
Dialogue
-0.55
Corrections
-0.54
POSITIVE LOGITS
ousand
1.03
otted
0.98
velength
0.95
perature
0.91
undred
0.89
opter
0.89
pped
0.84
enger
0.84
anded
0.83
pect
0.83
Activations Density 0.127%