INDEX
Explanations
phrases related to reasons or justifications
references to individuals and their actions or situations
New Auto-Interp
Negative Logits
----------------------------------------------------------------
-0.70
ortun
-0.66
nutshell
-0.66
GGGGGGGG
-0.65
UGE
-0.64
venge
-0.64
renaissance
-0.63
unveiling
-0.63
--------------------------------
-0.62
toget
-0.62
POSITIVE LOGITS
lacked
1.58
hadn
1.55
disagreed
1.32
objected
1.26
refused
1.25
wasn
1.23
lacks
1.23
feared
1.19
doubted
1.17
didn
1.16
Activations Density 0.370%