INDEX
Explanations
references to pairs or sets of things that are similar or related
references to twins
New Auto-Interp
Negative Logits
UME
-0.78
anwhile
-0.73
CoC
-0.72
WI
-0.70
andise
-0.68
arcity
-0.68
ugu
-0.66
Aware
-0.65
Clin
-0.64
Explain
-0.64
POSITIVE LOGITS
ning
0.96
ned
0.94
twin
0.90
brother
0.85
brothers
0.80
towers
0.76
twins
0.75
sister
0.75
fold
0.74
hered
0.70
Activations Density 0.008%