INDEX
Explanations
names or pronouns referring to people
references to people and their interactions
New Auto-Interp
Negative Logits
NetMessage
-0.45
Pwr
-0.38
ĵĺ
-0.34
WATCHED
-0.33
///
-0.30
VIDIA
-0.29
RESULTS
-0.28
ongyang
-0.28
Pastebin
-0.28
iosyn
-0.28
POSITIVE LOGITS
ividual
0.39
onward
0.33
iversary
0.32
orf
0.31
gins
0.30
fur
0.30
iazep
0.30
hower
0.29
lasts
0.29
ierre
0.29
Activations Density 2.319%