INDEX
Explanations
phrases indicating familiarity or recognition
references to familiarity with concepts or objects
New Auto-Interp
Negative Logits
chance
-0.79
Chance
-0.72
scoring
-0.68
hap
-0.66
mpeg
-0.65
oner
-0.64
tan
-0.64
rate
-0.64
reme
-0.63
hemy
-0.63
POSITIVE LOGITS
familiar
1.05
iliar
0.92
isable
0.88
faces
0.81
igan
0.81
idad
0.80
recognizable
0.79
iable
0.77
enough
0.77
unfamiliar
0.76
Activations Density 0.012%