INDEX
Explanations
familiarity and commonality
New Auto-Interp
Negative Logits
<0x0D>
0.98
?
0.94
hidrat
0.77
financi
0.75
لي
0.73
an
0.72
ed
0.72
</u>
0.71
</h2>
0.70
amit
0.70
POSITIVE LOGITS
familiar
0.95
다
0.93
familiarity
0.93
system
0.92
ר
0.86
রার
0.86
unfamiliar
0.85
accustomed
0.84
amiliar
0.84
ა
0.83
Activations Density 0.018%