INDEX
Explanations
mentions of specific names and titles, particularly in the context of academic or scientific references
New Auto-Interp
Negative Logits
ùa
-0.21
asons
-0.19
aven
-0.19
aker
-0.17
outh
-0.17
IME
-0.17
indre
-0.17
ensa
-0.16
edi
-0.16
oria
-0.16
POSITIVE LOGITS
ched
0.26
eller
0.19
ee
0.18
cc
0.17
roz
0.17
ches
0.17
ow
0.17
lod
0.16
incon
0.16
cd
0.16
Activations Density 0.040%