INDEX
Explanations
emphasizing adjectives and verbs
New Auto-Interp
Negative Logits
diamante
0.41
trp
0.41
γε
0.39
shreds
0.39
⎦
0.39
Pose
0.38
ने
0.37
Margins
0.37
бер
0.37
స్తారు
0.37
POSITIVE LOGITS
unwise
0.51
inconceivable
0.50
acaso
0.46
impossible
0.44
imperative
0.42
数为
0.41
necessário
0.40
irresponsible
0.40
konie
0.39
perverse
0.39
Activations Density 0.039%