INDEX
Explanations
words related to irreversibility or being beyond repair
New Auto-Interp
Negative Logits
GD
-0.73
Nights
-0.70
ucket
-0.67
KER
-0.65
anwhile
-0.65
HR
-0.64
Butterfly
-0.63
OHN
-0.62
creen
-0.62
arta
-0.62
POSITIVE LOGITS
voc
1.41
parable
1.30
ceivable
1.09
utive
0.96
hibited
0.95
serious
0.92
vious
0.90
medi
0.89
itable
0.89
itive
0.89
Activations Density 0.099%