INDEX
Explanations
verbs and expressions related to knowledge and awareness
New Auto-Interp
Negative Logits
ey
-0.16
itos
-0.16
ean
-0.15
allon
-0.15
agne
-0.15
servername
-0.14
erk
-0.14
tery
-0.14
as
-0.14
feather
-0.14
POSITIVE LOGITS
actual
0.16
ounds
0.15
actual
0.15
ecast
0.15
conds
0.15
ascus
0.14
ioni
0.14
Orm
0.14
観
0.14
endar
0.14
Activations Density 0.104%