INDEX
Explanations
profane language
exclamatory and profane expressions of frustration or anger
New Auto-Interp
Negative Logits
rece
-0.68
obook
-0.68
uchin
-0.64
obser
-0.64
HCR
-0.64
Interstitial
-0.63
dfx
-0.62
ioxide
-0.62
oult
-0.62
uries
-0.60
POSITIVE LOGITS
wit
0.97
drivers
0.95
driver
0.94
yeah
0.94
nuts
0.93
ery
0.91
holes
0.90
fuck
0.89
yeah
0.89
ers
0.88
Activations Density 0.035%