INDEX
Explanations
phrases related to perceptions and social narratives
New Auto-Interp
Negative Logits
èlement
-0.68
OOTDTY
-0.66
devrez
-0.62
NSObject
-0.61
Revenir
-0.60
pter
-0.60
ברס
-0.60
ніципа
-0.59
czuk
-0.59
IRST
-0.59
POSITIVE LOGITS
perception
1.01
perceptions
0.99
impression
0.92
Perceptions
0.85
Perception
0.80
Perception
0.75
印象
0.75
perception
0.74
impressions
0.73
stereotypes
0.73
Activations Density 0.242%