INDEX
Explanations
references to geographical locations and their cultural significance
New Auto-Interp
Negative Logits
Earlier
-0.16
earlier
-0.15
anta
-0.14
Earlier
-0.14
previously
-0.13
ande
-0.13
_advanced
-0.13
or
-0.13
apg
-0.13
Previous
-0.13
POSITIVE LOGITS
before
0.84
before
0.78
antes
0.66
Before
0.63
Before
0.62
_before
0.59
.before
0.56
-before
0.56
before
0.54
BEFORE
0.53
Activations Density 0.526%