INDEX
Explanations
instances of the word "aid" at varying activation levels
references to humanitarian aid
New Auto-Interp
Negative Logits
Bellev
-0.80
Beard
-0.67
Ran
-0.66
theless
-0.65
Mamm
-0.65
é¾
-0.64
aber
-0.63
archived
-0.61
unforgettable
-0.61
Wilde
-0.61
POSITIVE LOGITS
aid
1.37
glers
1.10
Aid
1.10
Aid
0.97
aids
0.92
uese
0.89
maid
0.83
ãĥ¼ãĥĨ
0.81
aid
0.79
Reviewer
0.78
Activations Density 0.020%