INDEX
Explanations
expressions of gratitude or relief
expressions of gratitude and appreciation
New Auto-Interp
Negative Logits
intensive
-0.73
umes
-0.68
amental
-0.68
igate
-0.62
soever
-0.62
inappropriately
-0.61
hotly
-0.60
excessively
-0.60
urst
-0.60
abus
-0.60
POSITIVE LOGITS
survived
1.17
spared
1.08
escaped
0.98
overcame
0.90
managed
0.88
saved
0.87
luckily
0.87
reen
0.85
intervened
0.84
survives
0.84
Activations Density 0.326%