INDEX
Explanations
mentions of universities and formal institutions
New Auto-Interp
Negative Logits
å·
-0.17
bury
-0.15
à¹ģรม
-0.15
tura
-0.15
ament
-0.14
inux
-0.14
unication
-0.14
ikes
-0.14
lav
-0.14
ahl
-0.14
POSITIVE LOGITS
jak
0.15
aight
0.15
porter
0.15
indexer
0.14
Stranger
0.14
Nath
0.13
hart
0.13
chen
0.13
cos
0.13
utto
0.13
Activations Density 0.119%