INDEX
Explanations
references to requirements or necessities of individuals or groups
New Auto-Interp
Negative Logits
âĸ¬
-0.71
é¾
-0.70
Bom
-0.68
ourning
-0.66
rook
-0.62
å°Ĩ
-0.60
rabbit
-0.59
ws
-0.59
Guilty
-0.58
wikipedia
-0.57
POSITIVE LOGITS
cale
0.90
lessly
0.86
giving
0.80
dictate
0.74
fulfilled
0.69
igslist
0.69
afe
0.68
domestically
0.68
constrained
0.66
etting
0.66
Activations Density 0.027%