INDEX
Explanations
statements or arguments related to information or facts
phrases indicating unusual or noteworthy facts
New Auto-Interp
Negative Logits
pione
-0.87
iership
-0.82
ü
-0.78
ā
-0.77
ė
-0.77
÷
-0.77
-0.77
ù
-0.77
ğ
-0.77
ø
-0.77
POSITIVE LOGITS
unlike
1.05
none
1.03
according
1.00
neither
1.00
despite
0.97
interestingly
0.94
nobody
0.94
whereas
0.94
according
0.93
although
0.91
Activations Density 0.472%