INDEX
Explanations
the word "phrase" followed by an intense activation value
recurring phrases or sentence structures
New Auto-Interp
Negative Logits
DERR
-0.84
ÄŁ
-0.83
©¶æ¥µ
-0.77
Thro
-0.73
Skydragon
-0.70
Adds
-0.68
fman
-0.68
Ka
-0.66
hari
-0.65
ntil
-0.63
POSITIVE LOGITS
phrase
1.07
ology
1.06
phrases
1.03
phrase
1.02
uttered
0.87
terday
0.86
witz
0.82
mith
0.77
atre
0.74
eting
0.74
Activations Density 0.016%