INDEX
Explanations
phrases that indicate personal reflections and opinions
New Auto-Interp
Negative Logits
deaux
-0.15
letcher
-0.15
ÑĪин
-0.14
acked
-0.14
540
-0.14
McKay
-0.14
mass
-0.14
боÑĢа
-0.13
246
-0.13
zb
-0.13
POSITIVE LOGITS
something
0.39
ones
0.37
something
0.35
Something
0.33
Something
0.31
omething
0.29
areas
0.21
Ones
0.21
ones
0.20
ä¸Ģç§į
0.20
Activations Density 0.220%