INDEX
Explanations
short phrases or sentences written by different individuals
references to individuals and their roles in various situations or events
New Auto-Interp
Negative Logits
united
-0.72
coffers
-0.67
Dates
-0.66
trillions
-0.65
etting
-0.64
mortals
-0.63
disparate
-0.63
irs
-0.63
Tickets
-0.62
ĸļ
-0.62
POSITIVE LOGITS
remarked
0.98
surn
0.97
commented
0.95
told
0.87
testified
0.87
wrote
0.83
who
0.81
nicknamed
0.77
named
0.76
complained
0.76
Activations Density 0.258%