INDEX
Explanations
phrases indicating similarity or sameness
New Auto-Interp
Negative Logits
propia
-0.63
propio
-0.61
préfère
-0.60
préfé
-0.60
льше
-0.60
surla
-0.58
apalagi
-0.57
@"/
-0.57
alguna
-0.57
meglio
-0.56
POSITIVE LOGITS
thing
1.37
exact
1.23
exact
0.98
Exact
0.98
EXACT
0.97
thing
0.91
THING
0.90
amount
0.87
coisa
0.79
Exact
0.77
Activations Density 0.200%