INDEX
Explanations
references to social sciences and related academic disciplines
New Auto-Interp
Negative Logits
esseract
-0.16
shi
-0.16
izzard
-0.16
imir
-0.15
rosso
-0.15
raf
-0.15
icerca
-0.14
eczy
-0.14
ober
-0.14
lei
-0.14
POSITIVE LOGITS
ucha
0.17
alon
0.16
Hubb
0.15
Wich
0.15
/scripts
0.14
erties
0.14
erti
0.14
кав
0.14
är
0.14
fre
0.14
Activations Density 0.025%