INDEX
Explanations
words related to actions and events
instances of actions or experiences involving social interaction and personal perception
New Auto-Interp
Negative Logits
Stability
-0.71
insula
-0.67
feasibility
-0.67
tein
-0.64
integration
-0.64
convergence
-0.62
Combine
-0.61
LF
-0.60
shader
-0.60
ces
-0.57
POSITIVE LOGITS
aback
0.95
oing
0.80
ĸļ
0.80
inged
0.77
ãĤ¼
0.74
ptin
0.74
©¶æ¥µ
0.74
zeb
0.72
anked
0.71
chy
0.71
Activations Density 0.293%