INDEX
Explanations
phrases related to personal beliefs and actions
references to social interactions and relationships
New Auto-Interp
Negative Logits
CLASSIFIED
-0.64
å°Ĩ
-0.54
±
-0.53
ãĥŃ
-0.52
wcsstore
-0.52
æ©Ł
-0.52
ãĥĥãĥĪ
-0.51
axter
-0.47
Bundes
-0.47
NSA
-0.46
POSITIVE LOGITS
whilst
0.84
because
0.75
.",
0.69
whereas
0.68
;
0.64
whenever
0.64
('0.62
.;
0.59
.","
0.59
while
0.59
Activations Density 1.776%