INDEX
Explanations
mentions of the word "rap" at varying strengths of activation
instances of the word "rap" in various contexts
New Auto-Interp
Negative Logits
wills
-0.75
Takeru
-0.67
bright
-0.63
Mississ
-0.62
recall
-0.62
Jong
-0.59
Schwarz
-0.59
Lauder
-0.58
Garg
-0.58
natureconservancy
-0.58
POSITIVE LOGITS
odcast
1.15
rap
1.09
hene
0.96
olitics
0.94
heny
0.94
artisan
0.89
ixel
0.88
inion
0.87
ascal
0.86
oline
0.85
Activations Density 0.007%