INDEX
Explanations
descriptions or comparisons of experiences
phrases expressing subjective experiences and emotions
New Auto-Interp
Negative Logits
ahime
-0.79
phabet
-0.69
luster
-0.69
ukong
-0.65
ãĥĩãĤ£
-0.65
\\\\\\\\
-0.64
andise
-0.62
artifacts
-0.61
Americ
-0.61
Winner
-0.61
POSITIVE LOGITS
to
0.73
raining
0.72
tical
0.63
everyday
0.63
icult
0.61
ta
0.59
psychologically
0.58
folly
0.57
uphill
0.55
aliens
0.55
Activations Density 0.132%