INDEX
Explanations
phrases related to irreversibility or permanent damage
terms related to irreversibility and permanent change
New Auto-Interp
Negative Logits
ramid
-0.80
Hunters
-0.77
anwhile
-0.74
ucket
-0.73
guiActiveUnfocused
-0.71
Trials
-0.70
owler
-0.69
wagen
-0.67
Butterfly
-0.67
auri
-0.67
POSITIVE LOGITS
voc
1.29
parable
1.02
irre
0.96
viation
0.91
agan
0.88
itable
0.87
asonable
0.87
lev
0.86
ality
0.85
cover
0.85
Activations Density 0.012%