INDEX
Explanations
modal verbs indicating ability or permission
New Auto-Interp
Negative Logits
atively
-0.18
ically
-0.17
ought
-0.16
ucz
-0.15
themselves
-0.15
unday
-0.14
itself
-0.14
bilin
-0.14
ék
-0.14
ughters
-0.14
POSITIVE LOGITS
expect
0.27
always
0.26
bet
0.25
expect
0.22
Bet
0.21
either
0.21
always
0.21
Always
0.20
imagine
0.20
Always
0.19
Activations Density 0.119%