INDEX
Explanations
phrases that indicate reasoning or justification
New Auto-Interp
Negative Logits
myſelf
-1.12
Theſe
-1.11
ArrowToggle
-1.09
itſelf
-1.03
theſe
-0.97
himſelf
-0.94
Roskov
-0.93
RegressionTest
-0.92
Wikimedijinoj
-0.89
contextLoads
-0.87
POSITIVE LOGITS
perché
0.85
because
0.81
Perché
0.81
Porque
0.80
perchè
0.77
Because
0.76
porque
0.76
Because
0.74
because
0.72
sababu
0.72
Activations Density 0.143%