INDEX
Explanations
introduces explanation or contrast
New Auto-Interp
Negative Logits
thereby
1.95
which
1.87
which
1.87
ensuring
1.62
keeping
1.61
从而
1.59
allowing
1.55
hoping
1.52
Which
1.47
aiming
1.44
POSITIVE LOGITS
exista
0.85
Хотя
0.81
quoique
0.81
existir
0.80
существует
0.80
Exists
0.80
istnieje
0.80
хотя
0.80
esiste
0.78
conviene
0.75
Activations Density 0.068%