INDEX
Explanations
names of individuals
repeated mentions of specific names and unique identifiers in the text
New Auto-Interp
Negative Logits
istics
-0.86
parts
-0.77
ARD
-0.70
à¦
-0.70
ãģį
-0.69
fry
-0.68
ariat
-0.67
WATCHED
-0.65
Reincarn
-0.64
membr
-0.63
POSITIVE LOGITS
Jed
1.20
seys
1.02
ouble
0.80
hua
0.80
arkin
0.79
lik
0.79
rus
0.76
ko
0.75
warm
0.74
ealous
0.74
Activations Density 0.015%