INDEX
Explanations
instances where the word 'plight' is used
terms related to hardship or suffering
New Auto-Interp
Negative Logits
bindings
-0.67
weights
-0.66
nucle
-0.65
neutral
-0.60
tein
-0.59
subp
-0.58
latt
-0.57
BST
-0.57
ENC
-0.57
OH
-0.57
POSITIVE LOGITS
hooting
0.83
plag
0.82
ufact
0.80
cape
0.76
havoc
0.72
ours
0.71
doms
0.71
stadt
0.71
miser
0.69
endured
0.69
Activations Density 0.051%