INDEX
Explanations
phrases related to the meaning or interpretation of words or concepts
New Auto-Interp
Negative Logits
tsy
-0.62
ython
-0.61
suicides
-0.56
anches
-0.56
oqu
-0.56
mention
-0.55
andy
-0.55
deaths
-0.54
misplaced
-0.53
wonder
-0.53
POSITIVE LOGITS
!.
0.80
Anyway
0.80
anyway
0.77
.</
0.77
anyways
0.76
!:
0.75
.[
0.73
.
0.71
.?
0.70
!?
0.69
Activations Density 0.224%