INDEX
Explanations
phrases that categorize or describe types or kinds of things
New Auto-Interp
Negative Logits
rotum
-0.86
Дереккөздер
-0.80
pleaſure
-0.80
myſelf
-0.80
houſe
-0.76
ſta
-0.74
ViewFeatures
-0.73
+#+#
-0.73
reaſon
-0.72
Roach
-0.71
POSITIVE LOGITS
KIND
1.10
sort
1.06
kind
1.06
Kind
1.04
sorta
1.00
KIND
0.97
SORT
0.95
Sort
0.94
kind
0.94
Kind
0.91
Activations Density 0.097%