INDEX
Explanations
references to political figures and endorsements or statements about them
phrases associated with names and their significance
New Auto-Interp
Negative Logits
olution
-0.75
olved
-0.64
olutions
-0.63
ombo
-0.63
processing
-0.62
ruction
-0.61
Parameter
-0.60
Abstract
-0.60
Storage
-0.59
apter
-0.59
POSITIVE LOGITS
his
1.08
His
1.00
his
0.97
he
0.95
His
0.92
He
0.76
He
0.76
vou
0.76
temperament
0.73
himself
0.73
Activations Density 1.098%