INDEX
Explanations
references to light bulbs and electricity
New Auto-Interp
Negative Logits
verages
-0.62
politics
-0.62
Everett
-0.58
vich
-0.57
..........
-0.55
livion
-0.55
Mayo
-0.55
Volks
-0.55
Bund
-0.54
Hayward
-0.53
POSITIVE LOGITS
mith
0.93
hips
0.89
hops
0.78
uggest
0.73
hin
0.69
behind
0.67
creen
0.67
poons
0.65
earch
0.64
pec
0.64
Activations Density 15.768%