INDEX
Explanations
references to user complaints and suggestions in application feedback
New Auto-Interp
Negative Logits
Moreover
-0.38
himself
-0.37
herself
-0.36
Moreover
-0.32
あるいは
-0.31
bowiem
-0.31
moreover
-0.30
のである
-0.30
himself
-0.29
herself
-0.29
POSITIVE LOGITS
Been
1.20
Been
1.20
Didn
1.03
Seems
1.02
Gonna
1.02
Wasn
0.98
Feels
0.98
got
0.98
gotta
0.97
Gonna
0.97
Activations Density 0.549%