INDEX
Explanations
the word "unfamiliar" or related variations
terms related to unfamiliarity and inexperience
New Auto-Interp
Negative Logits
ramid
-0.92
owder
-0.77
raq
-0.74
die
-0.73
trak
-0.72
roxy
-0.71
essen
-0.67
di
-0.67
aminer
-0.67
illance
-0.67
POSITIVE LOGITS
unfamiliar
1.38
ity
0.88
familiar
0.77
lihood
0.75
arous
0.75
uously
0.74
acquaintance
0.72
sworth
0.68
etheless
0.67
strangers
0.67
Activations Density 0.007%