INDEX
Explanations
phrases referring to causes or explanations
New Auto-Interp
Negative Logits
extAlignment
-0.98
orgeous
-0.96
الحره
-0.94
BeginContext
-0.88
Datuak
-0.87
Atsauces
-0.86
كومونز
-0.86
haustible
-0.85
edipus
-0.85
zsef
-0.84
POSITIVE LOGITS
reasons
1.65
Reasons
1.47
reason
1.45
Reason
1.43
REASON
1.39
reasons
1.39
Reason
1.30
Reasons
1.27
REASONS
1.25
reason
1.25
Activations Density 0.083%