INDEX
Explanations
phrases related to decisions or statements made by individuals or organizations
references to decisions, rulings, and announcements in a formal context
New Auto-Interp
Negative Logits
behav
-0.63
liking
-0.60
dying
-0.60
Joined
-0.59
hobbies
-0.58
hemor
-0.58
ghost
-0.57
envy
-0.56
Unix
-0.56
training
-0.55
POSITIVE LOGITS
underscores
1.13
signifies
1.07
reinforces
1.03
represents
1.03
proves
1.00
illustrates
0.99
demonstrates
0.99
reminds
0.98
reflects
0.95
brings
0.93
Activations Density 0.484%