INDEX
Explanations
terms related to the subject itself
phrases that emphasize the concept of 'itself'
New Auto-Interp
Negative Logits
ulton
-0.71
oos
-0.67
Moy
-0.66
Sierra
-0.65
im
-0.65
oslov
-0.63
ozy
-0.63
asia
-0.63
Frazier
-0.62
imag
-0.62
POSITIVE LOGITS
tremend
1.08
proport
0.83
exting
0.80
conduc
0.80
ashamed
0.80
é¾įåĸļ士
0.79
self
0.78
guarded
0.75
selves
0.74
contained
0.74
Activations Density 0.015%