INDEX
Explanations
references to the "Harry Potter" series and related characters
New Auto-Interp
Negative Logits
Meng
-0.16
NC
-0.15
808
-0.14
Nu
-0.14
Abstract
-0.14
ird
-0.14
388
-0.14
209
-0.14
GT
-0.13
IRD
-0.13
POSITIVE LOGITS
Harry
0.54
Harry
0.49
Potter
0.47
Rowling
0.40
HP
0.40
wizard
0.39
Hogwarts
0.37
Dumbledore
0.37
Snape
0.37
Hermione
0.36
Activations Density 0.022%