INDEX
Explanations
recurring themes or trends in societal behavior and norms
New Auto-Interp
Negative Logits
****************************************************************************
-0.16
uled
-0.15
ories
-0.15
iangle
-0.15
enburg
-0.15
RICT
-0.14
ouro
-0.14
interes
-0.14
interest
-0.14
еÑĢов
-0.13
POSITIVE LOGITS
normal
0.33
norm
0.29
NORMAL
0.29
-normal
0.29
normal
0.28
routine
0.28
Normal
0.28
ноÑĢм
0.28
part
0.27
normalize
0.26
Activations Density 0.246%