INDEX
Explanations
references to high-profile individuals and their achievements
New Auto-Interp
Negative Logits
Meng
-0.19
Hus
-0.15
Donne
-0.15
Wheeler
-0.15
388
-0.15
Bach
-0.15
ird
-0.14
IRD
-0.14
858
-0.14
hack
-0.14
POSITIVE LOGITS
Harry
0.54
Harry
0.50
Potter
0.48
Pot
0.38
Rowling
0.38
wizard
0.37
Pot
0.37
Hogwarts
0.36
Dumbledore
0.35
pot
0.35
Activations Density 0.021%