INDEX
Explanations
universities and their schools
New Auto-Interp
Negative Logits
Afghans
0.75
murderers
0.72
bepa
0.71
confusion
0.71
0.71
windows
0.70
impor
0.68
kon
0.67
fantasia
0.67
shortcut
0.66
POSITIVE LOGITS
Üniversitesi
1.05
大学
0.90
University
0.86
University
0.84
Sciences
0.80
economist
0.80
университета
0.78
Science
0.76
science
0.75
Science
0.75
Activations Density 0.002%