INDEX
Explanations
emotional and opinionated language regarding current events or social issues
New Auto-Interp
Negative Logits
incorpor
-0.72
clitor
-0.66
Mirage
-0.65
synerg
-0.62
mounts
-0.61
perspect
-0.61
conduc
-0.60
hemor
-0.59
partnerships
-0.59
Antar
-0.59
POSITIVE LOGITS
ï¸ı
1.34
âĶĢâĶĢ
0.94
§
0.87
ternity
0.81
âĸł
0.77
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.77
âĤ¬
0.77
£
0.77
âĶĢâĶĢâĶĢâĶĢ
0.76
$
0.75
Activations Density 0.176%