INDEX
Explanations
references to relationships and connections with others
New Auto-Interp
Negative Logits
themſelves
-0.50
saddles
-0.49
itſelf
-0.47
Beide
-0.47
οπο
-0.46
rootNode
-0.45
Автор
-0.43
rot
-0.43
keduanya
-0.43
대로
-0.43
POSITIVE LOGITS
fellow
1.23
peers
1.01
fellow
1.01
colleagues
0.98
teammate
0.95
rivals
0.90
Fellow
0.90
colleague
0.90
partner
0.89
mates
0.88
Activations Density 0.220%