INDEX
Explanations
causal relationships and explanations, especially those starting with "because."
New Auto-Interp
Negative Logits
MLLoader
-0.73
matchCondition
-0.65
invokeLater
-0.64
femininos
-0.61
houſe
-0.60
Verſ
-0.60
Houſe
-0.58
sentenza
-0.57
abestanden
-0.56
Monfieur
-0.56
POSITIVE LOGITS
because
0.84
because
0.68
weil
0.68
Porque
0.67
porque
0.64
omdat
0.63
لأنه
0.61
karena
0.60
Porque
0.59
Because
0.59
Activations Density 0.829%