INDEX
Explanations
terms related to admiration or reverence towards someone or something
references to "idols" or figures of admiration and worship
New Auto-Interp
Negative Logits
aunder
-0.77
Dull
-0.77
RAW
-0.74
20439
-0.72
--------------------------------------------------------
-0.70
~~~~~~~~~~~~~~~~
-0.68
Klu
-0.67
utenberg
-0.66
xp
-0.66
Attention
-0.65
POSITIVE LOGITS
idol
1.32
idols
1.05
Idol
0.94
sacrific
0.94
worshipped
0.93
ãħĭãħĭ
0.89
ãħĭ
0.83
worship
0.82
worsh
0.78
zeb
0.76
Activations Density 0.007%