INDEX
Explanations
references to pairs or groups, particularly the word "couple."
New Auto-Interp
Negative Logits
transQ
-0.53
écoulé
-0.52
참고
-0.50
estándares
-0.48
Numerade
-0.46
iodía
-0.46
galkan
-0.45
Universitaria
-0.44
Verhältnisse
-0.44
actéristi
-0.43
POSITIVE LOGITS
couple
0.90
pair
0.76
latter
0.66
couple
0.66
paio
0.59
Couple
0.58
pair
0.58
"""
0.58
Couple
0.55
PAIR
0.54
Activations Density 0.063%