INDEX
Explanations
phrases related to familiarity or recognition
references to familiarity or recognition
New Auto-Interp
Negative Logits
ode
-0.65
break
-0.65
ovember
-0.64
milo
-0.64
ership
-0.62
!!!
-0.61
Values
-0.60
Extension
-0.60
Ended
-0.59
_(
-0.59
POSITIVE LOGITS
familiar
3.81
unfamiliar
2.22
amiliar
2.21
familiarity
1.80
acquainted
1.71
accustomed
1.57
acquaint
1.52
recognizable
1.33
iliar
1.30
intimately
1.28
Activations Density 0.031%