INDEX
Explanations
references to irony and unexpected contrasts
New Auto-Interp
Negative Logits
aley
-0.17
ãĥĪãĥª
-0.17
.SC
-0.16
ory
-0.16
yk
-0.15
ALA
-0.15
centration
-0.14
antasy
-0.14
lena
-0.14
ographics
-0.14
POSITIVE LOGITS
exactly
0.20
precisely
0.19
557
0.17
ä¼ı
0.15
caut
0.15
stile
0.14
assin
0.14
Harris
0.14
sem
0.14
one
0.14
Activations Density 0.208%