INDEX
Explanations
phrases indicating social dynamics or societal conditions
New Auto-Interp
Negative Logits
ÐķС
-0.16
ients
-0.15
atsu
-0.15
/lg
-0.14
leta
-0.14
(iOS
-0.14
UNKNOWN
-0.14
unknown
-0.14
eliac
-0.14
unknown
-0.14
POSITIVE LOGITS
instead
0.48
instead
0.44
rather
0.37
Instead
0.36
Instead
0.34
вмеÑģÑĤ
0.33
statt
0.30
Rather
0.29
rather
0.29
Rather
0.26
Activations Density 0.008%