INDEX
Explanations
phrases related to going against authority or the system
New Auto-Interp
Negative Logits
Cosponsors
-0.79
£ı
-0.69
aic
-0.68
Images
-0.66
reddits
-0.63
>:
-0.63
uid
-0.62
recip
-0.62
MpServer
-0.61
icion
-0.61
POSITIVE LOGITS
bounds
1.06
existence
0.83
nowhere
0.81
hiber
0.79
harms
0.78
trouble
0.74
shape
0.73
sight
0.72
captivity
0.72
bed
0.72
Activations Density 0.047%