INDEX
Explanations
phrases related to prior conditions or events
New Auto-Interp
Negative Logits
pleaſure
-0.86
AssemblyTitle
-0.85
habet
-0.75
Jefus
-0.74
ſelf
-0.72
fometimes
-0.71
itſelf
-0.71
pleaf
-0.70
ítě
-0.68
reafon
-0.68
POSITIVE LOGITS
before
2.16
before
2.06
Before
1.97
Before
1.91
BEFORE
1.85
BEFORE
1.77
sebelum
1.73
antes
1.66
πριν
1.57
innan
1.53
Activations Density 0.143%