INDEX
Explanations
verbs indicating action or decision
phrases related to the concept of "taking" or "not taking" something
New Auto-Interp
Negative Logits
lehem
-0.70
naire
-0.69
alam
-0.67
éŃĶ
-0.67
icol
-0.66
fw
-0.66
女
-0.65
lex
-0.65
ingen
-0.64
nell
-0.64
POSITIVE LOGITS
kindly
1.16
advantage
0.96
anymore
0.94
lightly
0.88
responsibility
0.86
seriously
0.86
sides
0.85
aback
0.82
aways
0.81
heed
0.79
Activations Density 0.049%