INDEX
Explanations
siblings, brother, sister, nieces, nephews
New Auto-Interp
Negative Logits
family
0.99
Familie
0.95
Family
0.95
Family
0.93
famiglia
0.92
family
0.91
家人
0.90
familie
0.87
loved
0.86
가족
0.85
POSITIVE LOGITS
brother
1.65
sister
1.57
Brother
1.52
brothers
1.47
sisters
1.47
Brothers
1.44
Sister
1.39
Brother
1.38
brother
1.36
sister
1.34
Activations Density 0.034%