INDEX
Explanations
words related to persistence or continuity over time
New Auto-Interp
Negative Logits
already
-0.17
Already
-0.17
ARISING
-0.17
гал
-0.16
Already
-0.16
oland
-0.16
artık
-0.16
already
-0.16
gone
-0.14
onec
-0.14
POSITIVE LOGITS
unchanged
0.32
intact
0.31
ders
0.29
steadfast
0.27
constant
0.26
untouched
0.25
faithful
0.23
unaffected
0.23
true
0.22
steady
0.22
Activations Density 0.042%