INDEX
Explanations
references to specific scenes or settings in narratives
New Auto-Interp
Negative Logits
standing
-0.17
deaux
-0.16
lander
-0.15
inç
-0.15
aload
-0.15
ters
-0.15
udge
-0.15
achi
-0.14
nger
-0.14
kiem
-0.14
POSITIVE LOGITS
uate
0.17
ýš
0.17
人çī©
0.15
rack
0.15
ed
0.15
Ø©
0.15
eker
0.15
eg
0.14
antro
0.14
adan
0.14
Activations Density 0.038%