INDEX
Explanations
references to friends and social relationships
New Auto-Interp
Negative Logits
Leer
-0.31
Policy
-0.29
createSlice
-0.29
Leona
-0.29
continu
-0.29
Warrior
-0.28
LastError
-0.28
Vision
-0.28
Leer
-0.28
vision
-0.28
POSITIVE LOGITS
friends
0.80
friends
0.76
friend
0.72
friend
0.72
Friends
0.70
acquaintances
0.69
незавершена
0.68
vrienden
0.68
FRIENDS
0.68
Freundin
0.66
Activations Density 0.422%