INDEX
Explanations
phrases related to the history or track record of individuals or entities
New Auto-Interp
Negative Logits
ishable
-0.74
Flow
-0.71
wolves
-0.70
Sport
-0.69
ateurs
-0.68
ij士
-0.66
uri
-0.65
Stars
-0.64
idden
-0.64
ishers
-0.63
POSITIVE LOGITS
dating
0.93
spanning
0.82
revolving
0.81
associ
0.77
of
0.74
dealing
0.74
supporting
0.74
favoring
0.73
advocating
0.73
exposing
0.73
Activations Density 0.053%