INDEX
Explanations
explaining arguments and follow-up points
New Auto-Interp
Negative Logits
attrezz
0.50
Gigabit
0.48
Fleurit
0.43
スケジュール
0.42
বছর
0.42
বছর
0.41
sfrutt
0.41
ເຄື່ອງ
0.40
தினமும்
0.40
Programm
0.40
POSITIVE LOGITS
argument
1.01
аргу
0.95
disagree
0.91
disagreement
0.90
arguing
0.87
arguments
0.85
argu
0.85
argumentative
0.85
disagreed
0.83
argumentation
0.82
Activations Density 0.049%