INDEX
Explanations
mentions of the word "Hum" with comparably high activations
references to humor or comedic elements
New Auto-Interp
Negative Logits
cape
-0.71
Peaks
-0.71
---------
-0.71
EntityItem
-0.70
ãĤ´ãĥ³
-0.68
å§
-0.65
hips
-0.65
flagged
-0.64
Sands
-0.62
approve
-0.62
POSITIVE LOGITS
pty
1.06
ility
1.04
orously
1.03
iday
1.02
undai
1.01
ankind
0.99
Hum
0.92
Hum
0.88
mers
0.87
obiles
0.86
Activations Density 0.012%