INDEX
Explanations
proper nouns and phrases related to questioning or debating aspects of a topic
references to specific entities and criticisms of government and institutions
New Auto-Interp
Negative Logits
.�
-0.66
.<
-0.63
oven
-0.61
};
-0.60
.''
-0.60
ŃĶ
-0.59
.;
-0.58
ģ«
-0.57
>.
-0.57
`.
-0.56
POSITIVE LOGITS
exists
0.93
fails
0.93
couldn
0.92
lacks
0.87
might
0.86
shines
0.86
hasn
0.85
refuses
0.85
suddenly
0.84
behaves
0.84
Activations Density 0.532%