INDEX
Explanations
references to historical events or a person's personal history
references to historical events or the concept of the past
New Auto-Interp
Negative Logits
anguage
-0.78
wagon
-0.75
shapeshifter
-0.73
hyde
-0.67
mand
-0.66
lee
-0.64
NEY
-0.64
ately
-0.63
Downloadha
-0.63
zhen
-0.63
POSITIVE LOGITS
ebin
1.55
iche
1.17
tense
1.05
ures
1.01
ime
1.00
orate
0.92
imes
0.91
generations
0.84
oral
0.80
ure
0.77
Activations Density 0.029%