INDEX
Explanations
affirmations or confirmations of statements
New Auto-Interp
Negative Logits
Jefus
-0.75
Salvatore
-0.75
^(@)
-0.74
FOS
-0.74
Maren
-0.73
Efq
-0.73
Moslem
-0.73
AntiForgeryToken
-0.70
LOS
-0.69
}}">
-0.69
POSITIVE LOGITS
indeed
1.77
indeed
1.69
Indeed
1.62
Indeed
1.61
的确
0.86
确实
0.81
確實
0.76
inderdaad
0.73
fact
0.73
的確
0.71
Activations Density 0.082%