INDEX
Explanations
dates or time related expressions
references to a specific character or figure in a narrative, particularly with names or titles
New Auto-Interp
Negative Logits
Beir
-0.77
hypers
-0.70
itionally
-0.66
Claus
-0.65
arella
-0.64
ivid
-0.64
killer
-0.64
Cind
-0.63
agne
-0.63
LAPD
-0.63
POSITIVE LOGITS
Åį
1.11
nen
1.04
··
0.98
¬
0.97
Å«
0.93
su
0.91
nin
0.90
shi
0.90
rates
0.85
jin
0.84
Activations Density 0.007%