INDEX
Explanations
mentions of specific individuals or entities associated with awards or notable achievements
New Auto-Interp
Negative Logits
TWA
-0.87
Monfieur
-0.84
internetowa
-0.81
pleaſure
-0.77
Langu
-0.76
cauſe
-0.75
Bask
-0.75
behavi
-0.73
Wink
-0.73
Abit
-0.73
POSITIVE LOGITS
Bbb
0.91
}:=
0.84
racuse
0.82
Walsh
0.81
alignItems
0.81
PPT
0.80
∈
0.77
ak
0.77
Walsh
0.77
—
0.75
Activations Density 0.235%