INDEX
Explanations
references to daily activities or routines
references to daily activities and schedules
New Auto-Interp
Negative Logits
hett
-0.78
ngth
-0.77
emort
-0.75
ACTED
-0.70
obal
-0.68
Eag
-0.67
æ©
-0.67
ashtra
-0.66
oice
-0.66
icultural
-0.66
POSITIVE LOGITS
dream
1.79
care
1.19
lights
1.10
break
1.07
light
1.01
nings
0.97
trip
0.94
long
0.90
BOOK
0.88
glass
0.87
Activations Density 0.084%