INDEX
Explanations
phrases indicating a challenge or difficulty
New Auto-Interp
Negative Logits
rongh
-0.73
roma
-0.62
Kings
-0.62
gemony
-0.59
irie
-0.57
notations
-0.56
oak
-0.56
threat
-0.55
bey
-0.55
aura
-0.55
POSITIVE LOGITS
enough
0.97
consolation
0.76
entimes
0.76
imagining
0.73
coded
0.72
BALL
0.71
uphill
0.66
circumst
0.66
vain
0.65
ioned
0.65
Activations Density 0.028%