INDEX
Explanations
words related to family and social relationships
mentions of relatives and family connections
New Auto-Interp
Negative Logits
oted
-0.77
iasis
-0.69
SOC
-0.63
boots
-0.62
headed
-0.60
Capitalism
-0.59
Loch
-0.58
avery
-0.57
abad
-0.57
JS
-0.57
POSITIVE LOGITS
relatives
0.87
hips
0.85
llah
0.84
remem
0.82
hetical
0.81
hetically
0.79
arten
0.78
acquaintances
0.77
describ
0.73
confir
0.71
Activations Density 0.086%