INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
refugees
1.11
heroes
1.08
suppliers
1.02
monkeys
1.02
experts
1.01
soldiers
0.95
employees
0.95
brothers
0.94
judges
0.94
manufacturers
0.93
POSITIVE LOGITS
5
0.99
1
0.88
.'
0.86
3
0.85
."
0.84
7
0.84
,"
0.83
Upon
0.83
single
0.82
4
0.82
Activations Density 0.000%