INDEX
Explanations
dates formatted as month followed by a number, such as "June 3rd"
dates and temporal references in the text
New Auto-Interp
Negative Logits
hygiene
-0.69
igans
-0.61
CARD
-0.59
illas
-0.57
ãĥ¼ãĥĨ
-0.57
udging
-0.56
igham
-0.56
interven
-0.56
idences
-0.55
ãĥķãĤ¡
-0.54
POSITIVE LOGITS
th
1.01
rd
0.99
TH
0.79
nd
0.78
â̳
0.72
ths
0.71
2200
0.70
â̲
0.70
stice
0.69
itia
0.69
Activations Density 0.075%