INDEX
Explanations
specific phrases and terminology related to societal structures and classifications
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.06
3:0.04
4:0.02
5:0.05
6:0.10
7:0.34
8:0.09
9:0.03
10:0.07
11:0.09
Negative Logits
sidx
-1.46
etheus
-1.33
untled
-1.31
hesda
-1.25
obos
-1.22
conclud
-1.17
livest
-1.11
externalToEVAOnly
-1.10
inflamm
-1.10
Qiao
-1.09
POSITIVE LOGITS
yacht
1.33
clubhouse
1.00
Neander
0.98
ework
0.98
ヘ
0.97
Marine
0.97
eur
0.93
Slater
0.93
Monk
0.92
surf
0.92
Activations Density 0.696%