INDEX
Explanations
assertions related to merit and credibility in arguments or claims
New Auto-Interp
Negative Logits
myſelf
-0.85
pleaſure
-0.84
itſelf
-0.82
Efq
-0.81
purpoſe
-0.80
greateſt
-0.77
Conſ
-0.77
'\\;'
-0.76
Reſ
-0.75
reaſon
-0.74
POSITIVE LOGITS
proposta
0.56
предложение
0.53
arguments
0.51
claims
0.51
идеи
0.51
Tazama
0.51
refuted
0.51
défend
0.50
claim
0.49
hver
0.47
Activations Density 0.704%