INDEX
Explanations
phrases indicating prior actions or events
New Auto-Interp
Negative Logits
reaſon
-0.77
pleaſure
-0.77
ſeveral
-0.77
Jefus
-0.75
ſelf
-0.74
purpoſe
-0.70
raiſ
-0.69
Conſ
-0.69
ſelves
-0.68
ſtate
-0.68
POSITIVE LOGITS
before
0.94
sebelum
0.94
πριν
0.83
bevor
0.82
before
0.82
før
0.81
BEFORE
0.79
BEFORE
0.74
Sebelum
0.74
voordat
0.74
Activations Density 0.237%