INDEX
Explanations
words related to specific cultural or religious references
New Auto-Interp
Negative Logits
ding
-0.78
sburgh
-0.78
olson
-0.76
etsy
-0.70
assed
-0.67
intosh
-0.66
raltar
-0.65
lished
-0.64
iverpool
-0.64
igree
-0.64
POSITIVE LOGITS
Dum
0.73
qi
0.73
Tao
0.72
Sabha
0.72
verse
0.72
plin
0.71
ze
0.71
iste
0.70
efully
0.70
jin
0.69
Activations Density 0.005%