INDEX
Explanations
pronouns and verbs related to actions or decisions
actions related to helping or supporting others
New Auto-Interp
Negative Logits
>>>>>>>>
-0.59
................................
-0.58
[/
-0.57
reports
-0.51
++++++++
-0.50
Explain
-0.50
////////////////////////////////
-0.49
udos
-0.49
++++++++++++++++
-0.49
Nay
-0.49
POSITIVE LOGITS
supposedly
0.60
otherwise
0.57
ufact
0.55
deems
0.54
deemed
0.53
inevitably
0.52
legitimately
0.52
deem
0.51
udic
0.51
allegedly
0.50
Activations Density 0.923%