INDEX
Explanations
words with the prefix "rid" followed by a number indicating the activation strength
instances of the word "ridiculous."
New Auto-Interp
Negative Logits
largeDownload
-0.75
whales
-0.68
AU
-0.66
Prospect
-0.62
¬¼
-0.60
Markets
-0.59
Eagle
-0.58
SUP
-0.58
ULT
-0.56
HM
-0.56
POSITIVE LOGITS
rid
1.18
gew
1.03
gements
1.02
anus
1.00
gment
0.94
ifiers
0.93
gments
0.93
gement
0.90
ged
0.87
nesses
0.85
Activations Density 0.003%