INDEX
Explanations
phrases that indicate challenges or competitions
New Auto-Interp
Negative Logits
ewis
-0.16
амп
-0.15
OnInit
-0.14
620
-0.14
Sınıf
-0.14
Priority
-0.13
apon
-0.13
igne
-0.13
Mei
-0.13
arg
-0.13
POSITIVE LOGITS
challenge
0.86
Challenge
0.74
challenge
0.72
challenges
0.70
Challenge
0.69
challenged
0.66
Challenges
0.60
chall
0.56
challenging
0.55
_challenge
0.54
Activations Density 0.116%