INDEX
Explanations
names of political figures and locations
New Auto-Interp
Negative Logits
:(
-0.69
esthetic
-0.66
DIT
-0.64
fiction
-0.63
Redditor
-0.63
FIG
-0.62
pleasant
-0.62
":"/
-0.62
fff
-0.61
advertisement
-0.61
POSITIVE LOGITS
others
1.08
assorted
0.98
Kw
0.90
Zac
0.85
Tel
0.84
Trent
0.81
etc
0.80
Trey
0.80
etc
0.78
possibly
0.77
Activations Density 0.121%