INDEX
Explanations
references to close friendships
New Auto-Interp
Negative Logits
Fellow
-0.73
teammate
-0.60
Fellow
-0.60
fellow
-0.57
FELLOW
-0.55
predecessor
-0.50
fellow
-0.49
Partner
-0.46
compañero
-0.46
colleague
-0.45
POSITIVE LOGITS
friends
1.23
friend
0.96
fre
0.91
friends
0.91
Friends
0.91
fri
0.89
Friends
0.83
buds
0.81
frie
0.81
vrienden
0.80
Activations Density 0.190%