INDEX
Explanations
words related to recognition or honor
references to the name "John" and variations of its spelling
New Auto-Interp
Negative Logits
Tube
-0.76
Tube
-0.75
needles
-0.75
20439
-0.75
panicked
-0.74
blob
-0.71
Panic
-0.70
chunks
-0.70
tube
-0.70
dense
-0.69
POSITIVE LOGITS
hon
3.60
Honor
3.11
Hon
2.87
Honour
2.26
honor
2.24
honour
2.14
honorable
2.06
dishon
1.95
Hon
1.86
honors
1.77
Activations Density 0.024%