INDEX
Explanations
mentions of types, categories, and classifications within various contexts
questions and answers
New Auto-Interp
Negative Logits
SequentialGroup
-0.61
poffible
-0.59
<unused43>
-0.57
<unused41>
-0.57
<unused3>
-0.57
<unused42>
-0.57
<unused51>
-0.57
<unused8>
-0.56
[@BOS@]
-0.56
<pad>
-0.56
POSITIVE LOGITS
you
0.49
he
0.44
they
0.43
it
0.41
ass
0.40
pa
0.38
we
0.38
you
0.38
I
0.37
للاسماء
0.37
Activations Density 0.018%