INDEX
Explanations
questions and expressions of uncertainty
New Auto-Interp
Negative Logits
brook
-0.67
Brooks
-0.66
Recu
-0.62
BROOK
-0.62
castor
-0.61
Pollack
-0.60
Verge
-0.60
reminder
-0.59
COLS
-0.59
tráiler
-0.58
POSITIVE LOGITS
what
1.71
what
1.54
WHAT
1.54
What
1.53
What
1.50
WHAT
1.47
quelles
0.93
wat
0.91
Τι
0.89
وما
0.85
Activations Density 0.148%