INDEX
Explanations
phrases related to making mistakes or causing trouble
New Auto-Interp
Negative Logits
vation
-0.91
Interstitial
-0.85
rica
-0.69
uti
-0.64
ãĥ´
-0.63
Citation
-0.62
Ļ
-0.61
eele
-0.60
Ĺ
-0.58
oko
-0.58
POSITIVE LOGITS
around
0.98
around
0.97
havoc
0.97
Around
0.85
driver
0.77
bley
0.75
Around
0.75
up
0.71
ily
0.71
ishly
0.70
Activations Density 0.081%