INDEX
Explanations
phrases related to first-time experiences and novelty
New Auto-Interp
Negative Logits
thog
-0.53
lunches
-0.48
chut
-0.45
цо
-0.45
IGraphics
-0.43
ceres
-0.42
Lucius
-0.41
ძ
-0.40
assi
-0.40
אס
-0.40
POSITIVE LOGITS
unfamiliar
1.02
newcomer
0.89
novice
0.86
newcomers
0.86
Personensuche
0.81
novices
0.81
inconn
0.79
beginner
0.79
Roskov
0.79
المعيارى
0.78
Activations Density 0.230%