INDEX
Explanations
phrases related to past events or behavior patterns
references to history and track records of entities or individuals
New Auto-Interp
Negative Logits
ishable
-0.72
ando
-0.69
wagen
-0.67
oner
-0.66
uri
-0.66
idden
-0.65
asus
-0.63
itely
-0.63
agraph
-0.61
ower
-0.61
POSITIVE LOGITS
revolving
0.76
dating
0.70
spanning
0.69
breaking
0.69
fraught
0.68
cles
0.68
favorable
0.67
stretching
0.67
brewing
0.67
acqu
0.67
Activations Density 0.081%