INDEX
Explanations
references to a specific term "Jabari" or "Jab" at varying unique activations
occurrences of the name "Jabari" and related variants
New Auto-Interp
Negative Logits
åĬ
-0.77
Constructed
-0.73
Beck
-0.72
åħī
-0.72
tenance
-0.71
Marble
-0.69
CONCLUS
-0.68
Inquisition
-0.67
IDENT
-0.67
istically
-0.67
POSITIVE LOGITS
ber
0.99
rag
0.89
bler
0.89
rary
0.88
bing
0.87
ez
0.86
ril
0.85
bour
0.84
onis
0.83
deen
0.83
Activations Density 0.035%