INDEX
Explanations
verbs that express actions or behaviors
negations or expressions of inability or impossibility
New Auto-Interp
Negative Logits
catentry
-0.73
study
-0.67
Almighty
-0.62
LAB
-0.58
LIN
-0.58
ithing
-0.57
ilst
-0.57
Sabbath
-0.55
spectrum
-0.55
Milton
-0.54
POSITIVE LOGITS
anymore
1.29
bothered
0.93
ANY
0.84
*/(
0.83
nor
0.79
anywhere
0.79
Enough
0.76
bothering
0.75
bother
0.74
enough
0.74
Activations Density 0.388%