INDEX
Explanations
phrases that describe rationality or logic
New Auto-Interp
Negative Logits
rio
-0.18
sharedInstance
-0.18
cko
-0.16
èĭĹ
-0.16
ots
-0.15
INCIDENTAL
-0.15
itez
-0.15
edl
-0.15
ijken
-0.14
_Impl
-0.14
POSITIVE LOGITS
logical
0.19
logic
0.17
logic
0.17
éĢ
0.17
nar
0.16
Logical
0.16
arg
0.15
Logical
0.15
erral
0.15
fully
0.14
Activations Density 0.014%