INDEX
Explanations
statements about experiences and personal reflections
New Auto-Interp
Negative Logits
олÑĮно
-0.15
aga
-0.15
/Typography
-0.14
isteyen
-0.14
hers
-0.14
ÙģØ§Øª
-0.14
ailable
-0.13
оÑĢÑĥ
-0.13
ë´IJ
-0.13
uba
-0.13
POSITIVE LOGITS
being
0.21
èĩªå·±
0.18
although
0.16
it
0.16
people
0.16
having
0.15
Being
0.15
being
0.15
felt
0.15
feels
0.15
Activations Density 0.085%