INDEX
Explanations
the word "California"
references to California
New Auto-Interp
Negative Logits
Netflix
-0.80
AIDS
-0.75
Match
-0.71
Hoff
-0.71
listed
-0.70
Pirate
-0.70
jobs
-0.69
Turing
-0.67
Hulu
-0.66
anded
-0.65
POSITIVE LOGITS
cal
4.05
Cal
1.55
cal
1.54
CAL
1.51
Cal
1.28
calc
1.04
sul
1.02
tem
0.98
cu
0.97
cam
0.95
Activations Density 0.015%