INDEX
Explanations
references to specific names, potentially related to people or places
references to popular figures in entertainment, specifically late-night hosts
New Auto-Interp
Negative Logits
pus
-0.70
cm
-0.69
loudspe
-0.66
fingerprint
-0.65
spe
-0.64
mer
-0.61
printing
-0.61
symbolic
-0.61
supervised
-0.60
acceler
-0.60
POSITIVE LOGITS
Fallon
4.24
Kimmel
1.63
Fall
1.16
Finn
1.00
Newport
0.98
Grande
0.95
vati
0.94
Downing
0.94
Fiona
0.93
raltar
0.90
Activations Density 0.016%