INDEX
Explanations
instances of the word "admit" and its variations
New Auto-Interp
Negative Logits
rior
-0.76
ILCS
-0.72
lets
-0.67
tsky
-0.66
miah
-0.66
atility
-0.64
tein
-0.62
chn
-0.61
Kinnikuman
-0.60
colo
-0.59
POSITIVE LOGITS
defeat
1.03
wrongdoing
0.99
guilt
0.89
ibility
0.84
iary
0.79
mistakes
0.78
fault
0.73
responsibility
0.71
admitting
0.70
weakness
0.69
Activations Density 0.071%