INDEX
Explanations
words related to prizes or rewards
references to prizes and their significance
New Auto-Interp
Negative Logits
heads
-0.81
Hurricanes
-0.81
tracks
-0.78
Leopard
-0.75
Leilan
-0.74
ocker
-0.74
head
-0.74
ãĤ¨ãĥ«
-0.72
Situation
-0.71
enegger
-0.70
POSITIVE LOGITS
pri
1.24
etary
1.05
eties
1.03
eteen
1.00
ety
0.95
eteenth
0.91
hon
0.91
İ
0.90
¦
0.87
ĸļ
0.86
Activations Density 0.005%