INDEX
Explanations
phrases that indicate prior actions or events
New Auto-Interp
Negative Logits
KommentareTeilen
-0.93
gdx
-0.85
habet
-0.85
Talley
-0.72
Messer
-0.69
genstein
-0.68
ſelf
-0.68
følge
-0.67
gany
-0.66
ณ์
-0.66
POSITIVE LOGITS
before
1.87
before
1.86
Before
1.81
BEFORE
1.81
BEFORE
1.74
Before
1.72
sebelum
1.48
innan
1.39
befo
1.37
πριν
1.33
Activations Density 0.105%