INDEX
Explanations
references to influential figures and their ideas or theories
New Auto-Interp
Negative Logits
Completed
-0.17
Done
-0.15
Ñĥз
-0.14
chaired
-0.14
caused
-0.14
áli
-0.14
done
-0.14
llen
-0.14
iei
-0.13
performed
-0.13
POSITIVE LOGITS
advanced
0.54
advanced
0.45
Advanced
0.38
Advanced
0.36
_advanced
0.30
prop
0.30
prom
0.28
avanz
0.28
esp
0.27
put
0.25
Activations Density 0.322%