INDEX
Explanations
the word "reason" in various forms, indicating explanations or justifications
New Auto-Interp
Negative Logits
haustible
-0.93
orgeous
-0.92
}$
-0.90
الحره
-0.88
zsef
-0.86
extAlignment
-0.86
كومونز
-0.86
$
-0.85
समीक्षक
-0.85
edipus
-0.84
POSITIVE LOGITS
reasons
1.94
reason
1.80
Reasons
1.72
REASON
1.68
reasons
1.66
Reason
1.66
reason
1.56
Reason
1.54
Reasons
1.52
REASONS
1.49
Activations Density 0.092%