INDEX
Explanations
instances of the word "knew" indicating prior knowledge or awareness
New Auto-Interp
Negative Logits
azzi
-0.18
alley
-0.18
ussia
-0.18
olina
-0.17
ÅĻÃŃ
-0.16
oucher
-0.15
hang
-0.15
uss
-0.15
aiser
-0.15
hung
-0.15
POSITIVE LOGITS
lify
0.16
IRECTION
0.15
pornografia
0.14
пÑĢеÑģ
0.13
ymm
0.13
mnop
0.13
env
0.13
REAM
0.13
fov
0.13
tables
0.13
Activations Density 0.008%