INDEX
Explanations
references to living conditions and environments
New Auto-Interp
Negative Logits
orsi
-0.83
awarding
-0.69
stride
-0.67
hitting
-0.67
ovi
-0.67
brass
-0.66
icer
-0.63
asking
-0.63
Flavoring
-0.63
blance
-0.63
POSITIVE LOGITS
estates
0.85
tents
0.83
poverty
0.82
cramped
0.82
rented
0.80
mansion
0.79
caves
0.78
subsistence
0.78
hosp
0.77
limbo
0.77
Activations Density 0.117%