INDEX
Explanations
references to a specific person or character named "He."
New Auto-Interp
Negative Logits
omial
-0.18
weep
-0.15
th
-0.15
y
-0.15
liv
-0.14
ãģ£ãģ¨
-0.14
ety
-0.14
onne
-0.14
mag
-0.14
yen
-0.14
POSITIVE LOGITS
idelberg
0.22
isman
0.22
bron
0.22
brew
0.21
imat
0.21
inz
0.20
/she
0.20
aviest
0.20
fce
0.20
avit
0.18
Activations Density 0.018%