INDEX
Explanations
phrases that introduce examples or list components
New Auto-Interp
Negative Logits
Anſ
-0.63
another
-0.62
cà
-0.60
مشين
-0.59
تاريخ
-0.59
Jefus
-0.58
Conſ
-0.58
Diſ
-0.57
ſen
-0.56
Ύ
-0.56
POSITIVE LOGITS
including
0.88
voorbeeld
0.87
INCLUDING
0.86
zoals
0.86
like
0.86
including
0.82
telles
0.82
např
0.81
Including
0.79
INCLUDING
0.77
Activations Density 0.152%