INDEX
Explanations
words related to forgiveness and acceptance
variations of the word "great"
New Auto-Interp
Negative Logits
eers
-0.73
phrine
-0.62
shortened
-0.62
burg
-0.61
sexes
-0.61
mortem
-0.61
vitro
-0.60
doors
-0.60
messenger
-0.60
WER
-0.60
POSITIVE LOGITS
illing
1.02
iffin
1.00
ille
1.00
ains
0.98
asp
0.98
rl
0.96
anting
0.96
iddle
0.94
gr
0.94
illed
0.94
Activations Density 0.004%