INDEX
Explanations
discussions around contentious social and political topics
New Auto-Interp
Negative Logits
anymore
-0.19
ilde
-0.16
raison
-0.16
urent
-0.16
aris
-0.16
åĨį
-0.15
èªł
-0.15
iek
-0.15
Everyday
-0.15
Again
-0.14
POSITIVE LOGITS
previously
0.48
lately
0.45
recently
0.42
previous
0.38
before
0.38
Previously
0.34
elsewhere
0.34
Previously
0.31
Previous
0.31
before
0.30
Activations Density 0.344%