INDEX
Explanations
however, but, however, but contrast
New Auto-Interp
Negative Logits
enk
0.47
idez
0.46
zen
0.44
pan
0.44
ായിരുന്നു
0.44
iii
0.42
питань
0.41
ardon
0.41
aforesaid
0.40
conclusão
0.40
POSITIVE LOGITS
smaller
0.49
هنوز
0.47
kleinere
0.46
grundsätzlich
0.46
weaker
0.45
整体
0.45
최근
0.45
generally
0.45
整體
0.45
البعض
0.43
Activations Density 0.002%