INDEX
Explanations
phrases indicating similarity or identicality
instances of the word "same."
New Auto-Interp
Negative Logits
ases
-0.80
rosso
-0.69
WI
-0.69
*=-
-0.69
arest
-0.68
rection
-0.67
rique
-0.67
xtap
-0.66
efully
-0.66
meet
-0.64
POSITIVE LOGITS
thing
1.23
way
0.92
amount
0.88
exact
0.88
old
0.85
kind
0.84
damn
0.83
ol
0.82
size
0.81
sized
0.80
Activations Density 0.042%