INDEX
Explanations
dates written in a specific format
occurrences of the sequence "01" in various contexts
New Auto-Interp
Negative Logits
exha
-0.93
awaru
-0.89
hemor
-0.85
notor
-0.79
tradem
-0.78
hung
-0.76
unaff
-0.74
pressured
-0.74
volunte
-0.74
hypot
-0.71
POSITIVE LOGITS
01
1.28
03
1.14
003
1.12
02
1.10
000000
1.08
009
1.08
0100
1.07
001
1.07
05
1.07
011
1.06
Activations Density 0.011%