INDEX
Explanations
text from the beginning or middle of a math textbook
relations
New Auto-Interp
Negative Logits
myſelf
-0.88
itſelf
-0.84
Jefus
-0.81
Efq
-0.81
الحره
-0.81
$_"
-0.77
Theſe
-0.77
Monfieur
-0.75
Houſe
-0.75
whoſe
-0.75
POSITIVE LOGITS
relation
0.61
Relation
0.50
R
0.50
relation
0.48
relations
0.47
equivalence
0.47
RELATION
0.46
wachsene
0.45
ir
0.45
par
0.44
Activations Density 2.408%