INDEX
Explanations
acronyms starting with "TH" followed by a number
instances of the word "TH" and variations of "Thing."
New Auto-Interp
Negative Logits
Libre
-0.77
iste
-0.68
angelo
-0.66
alia
-0.65
Cam
-0.64
inka
-0.62
shepherd
-0.62
nell
-0.62
ello
-0.62
Mil
-0.62
POSITIVE LOGITS
TH
3.69
TH
1.85
Th
1.54
Th
1.53
THR
1.44
WH
1.35
KN
1.28
Than
1.28
TW
1.27
STEP
1.27
Activations Density 0.014%