INDEX
Explanations
descriptions of social settings and interactions
New Auto-Interp
Negative Logits
stal
-0.16
ht
-0.15
outgoing
-0.15
heim
-0.15
stad
-0.15
Marg
-0.14
ply
-0.14
enberg
-0.14
ocrine
-0.14
hills
-0.14
POSITIVE LOGITS
inside
0.39
Inside
0.39
Inside
0.38
inside
0.36
indoors
0.33
interior
0.32
_inside
0.30
interiors
0.28
indoor
0.28
åħ§
0.27
Activations Density 0.145%