INDEX
Explanations
phrases related to unexpected or undesired situations in which individuals or groups find themselves
phrases related to personal experiences and self-reflection
New Auto-Interp
Negative Logits
alez
-0.80
Mub
-0.77
imize
-0.69
pour
-0.67
vert
-0.67
ritz
-0.67
oslov
-0.65
cemic
-0.65
grade
-0.65
Mehran
-0.63
POSITIVE LOGITS
selves
0.78
tremend
0.74
peria
0.73
anew
0.71
CHAT
0.71
æ³
0.68
personally
0.68
wanting
0.66
ANS
0.65
creatively
0.65
Activations Density 0.041%