INDEX
Explanations
sentences about personal opinions or experiences
New Auto-Interp
Negative Logits
opez
-0.76
elve
-0.74
luaj
-0.74
dies
-0.72
asers
-0.69
styles
-0.69
aneers
-0.67
ãĤ©
-0.66
letes
-0.65
chev
-0.65
POSITIVE LOGITS
happening
0.97
definitely
0.90
NOT
0.85
supposed
0.84
gonna
0.84
unacceptable
0.81
nt
0.80
truly
0.79
not
0.79
purely
0.77
Activations Density 0.117%