INDEX
Explanations
phrases indicating familiarity or habituation to experiences or situations
New Auto-Interp
Negative Logits
myſelf
-0.75
وتسجيلات
-0.70
Hentet
-0.68
Reſ
-0.67
>=",
-0.67
ſind
-0.66
Diſ
-0.64
Houſe
-0.63
Krim
-0.62
ſtate
-0.62
POSITIVE LOGITS
accustomed
1.03
привы
0.95
acostumb
0.91
gewohnt
0.82
familiar
0.77
habitude
0.74
habitu
0.73
习惯
0.72
habitual
0.72
familiar
0.69
Activations Density 0.103%