INDEX
Explanations
mentions of the word "scan" or related variations in the context of research or observation
New Auto-Interp
Negative Logits
pengu
-0.67
Kut
-0.64
manslaughter
-0.63
Yog
-0.59
Cind
-0.59
laure
-0.58
ment
-0.58
Aid
-0.57
rapp
-0.56
Masquerade
-0.56
POSITIVE LOGITS
lator
1.19
lations
1.08
ning
1.05
nery
0.96
lan
0.94
lon
0.93
lation
0.91
scans
0.89
ner
0.87
scan
0.87
Activations Density 0.062%