INDEX
Explanations
dates or events in political context
occurrences of the word "OR"
New Auto-Interp
Negative Logits
place
-0.73
ãĤ©
-0.70
inctions
-0.65
pter
-0.65
inelli
-0.65
dayName
-0.63
utch
-0.61
strings
-0.61
Dolphin
-0.60
users
-0.58
POSITIVE LOGITS
IENT
1.18
OR
1.10
GAN
1.07
IES
1.04
OGR
0.97
RY
0.96
IFIED
0.96
IZ
0.95
ATIVE
0.94
ourke
0.93
Activations Density 0.010%