INDEX
Explanations
the word "are" followed by a word
the phrase "you are" in various contexts
New Auto-Interp
Negative Logits
osate
-0.82
ESE
-0.67
uish
-0.64
ð
-0.64
FY
-0.63
ionics
-0.62
udeau
-0.61
Restore
-0.60
Rove
-0.60
suffice
-0.60
POSITIVE LOGITS
gonna
0.93
nt
0.93
able
0.89
yourself
0.85
lucky
0.83
choosing
0.80
intimately
0.79
willing
0.79
supposed
0.78
going
0.78
Activations Density 0.139%