INDEX
Explanations
words indicating the historical or original context of something
New Auto-Interp
Negative Logits
current
-0.52
and
-0.51
lle
-0.51
ings
-0.50
individual
-0.49
Current
-0.48
al
-0.48
czo
-0.48
ute
-0.47
ic
-0.47
POSITIVE LOGITS
originally
2.74
Originally
2.59
originally
2.45
Originally
2.40
originalmente
2.30
ursprünglich
1.99
initially
1.84
Initially
1.58
inicialmente
1.57
Initially
1.56
Activations Density 0.056%