INDEX
Explanations
themes related to personal relationships and the consequences of individual actions
New Auto-Interp
Negative Logits
ãĥĸãĥª
-0.15
Ñīей
-0.15
rig
-0.15
oui
-0.14
rod
-0.14
omit
-0.14
tg
-0.14
Moor
-0.14
ordan
-0.14
mou
-0.13
POSITIVE LOGITS
still
0.21
still
0.19
Still
0.18
Still
0.16
873
0.15
STILL
0.15
437
0.15
stil
0.14
nog
0.14
noch
0.14
Activations Density 0.423%