INDEX
Explanations
phrases that indicate rationality or logical reasoning
New Auto-Interp
Negative Logits
XmlAccessorType
-0.61
VICT
-0.60
Kel
-0.56
X
-0.56
Pur
-0.55
I
-0.55
Lu
-0.54
gridx
-0.54
O
-0.54
se
-0.53
POSITIVE LOGITS
reasonable
1.66
Reasonable
1.63
Reasonable
1.55
razonable
1.52
reasonable
1.48
raisonnable
1.46
reaſon
1.39
reasonably
1.36
Theſe
1.24
Houſe
1.21
Activations Density 0.204%