INDEX
Explanations
proper nouns, such as names of places, people, and organizations
terms related to inquiries and evaluations
New Auto-Interp
Negative Logits
disabling
-0.62
stimulating
-0.62
Introduced
-0.60
CLASSIFIED
-0.57
beginnings
-0.57
paren
-0.57
gradient
-0.57
Allows
-0.56
Casting
-0.55
ILA
-0.55
POSITIVE LOGITS
belonged
0.95
hail
0.89
belong
0.85
wore
0.84
were
0.81
consisted
0.80
owe
0.77
behaved
0.77
knew
0.75
include
0.74
Activations Density 0.304%