INDEX
Explanations
references to going out or participating in social activities
New Auto-Interp
Negative Logits
off
-0.17
chr
-0.17
ing
-0.15
plex
-0.15
osc
-0.15
ноз
-0.15
Chr
-0.14
ove
-0.14
go
-0.14
inst
-0.14
POSITIVE LOGITS
wards
0.21
doors
0.18
SIDE
0.17
Svc
0.16
Into
0.16
placement
0.15
Wass
0.15
á»IJ
0.15
кÑĢаÑĹ
0.15
ITTE
0.15
Activations Density 0.043%