INDEX
Explanations
arguing or presenting arguments
New Auto-Interp
Negative Logits
least
0.42
least
0.39
oon
0.38
difíciles
0.34
liber
0.34
tendencies
0.34
much
0.34
definitely
0.34
ゲット
0.33
mucho
0.33
POSITIVE LOGITS
argument
0.64
аргу
0.61
argument
0.58
Argument
0.52
arguments
0.52
Argument
0.52
argumentation
0.52
arguments
0.46
якобы
0.46
arguing
0.44
Activations Density 0.105%