INDEX
Explanations
phrases related to personal experiences and emotions
New Auto-Interp
Negative Logits
tein
-0.87
ifully
-0.73
moil
-0.72
voy
-0.71
iful
-0.66
vic
-0.66
raltar
-0.64
tex
-0.64
ERY
-0.63
çļ
-0.62
POSITIVE LOGITS
bothered
1.00
bother
0.95
anymore
0.93
bothering
0.89
cared
0.84
hin
0.80
mattered
0.79
fit
0.79
appreciated
0.76
anywhere
0.74
Activations Density 0.032%