INDEX
Explanations
phrases indicating a collective or group action
references to collective actions or concepts
New Auto-Interp
Negative Logits
confirmation
-0.65
Lovely
-0.59
rick
-0.59
disguise
-0.58
old
-0.57
jar
-0.55
ss
-0.55
removal
-0.55
old
-0.54
replacement
-0.54
POSITIVE LOGITS
collectively
3.78
individually
1.62
collective
1.55
jointly
1.50
respectively
1.30
collect
1.29
together
1.28
together
1.24
unanimously
1.19
toget
1.09
Activations Density 0.016%