INDEX
Explanations
expressions of uncertainty or subjective opinions
preceding explanations or justifications
explaining cause or reason
New Auto-Interp
Negative Logits
findpost
-0.82
abestanden
-0.81
itſelf
-0.69
мәкал
-0.69
Obrázky
-0.67
Maho
-0.66
Efq
-0.64
OGND
-0.64
Jefus
-0.64
ویکیپدیای
-0.63
POSITIVE LOGITS
because
1.16
due
1.00
because
0.94
porque
0.89
karena
0.86
Because
0.83
的原因
0.80
потому
0.79
是因為
0.79
Because
0.78
Activations Density 0.339%