INDEX
Explanations
proper nouns or specific names
phrases that refer to knowledge or familiarity with various subjects or concepts
New Auto-Interp
Negative Logits
itter
-0.79
ishers
-0.68
ŃĶ
-0.66
isco
-0.65
plet
-0.64
pex
-0.63
onding
-0.62
ishable
-0.61
TRY
-0.60
orem
-0.59
POSITIVE LOGITS
lege
1.07
intimately
0.95
firsthand
0.92
ledge
0.88
nothing
0.80
nothing
0.78
ledged
0.77
existed
0.77
instinctively
0.73
anecd
0.73
Activations Density 0.050%