INDEX
Explanations
references to communal spaces and shared experiences
New Auto-Interp
Negative Logits
ardy
-0.18
fone
-0.15
Tune
-0.15
itr
-0.15
iti
-0.15
_TUN
-0.14
ikan
-0.14
ew
-0.14
_SOFT
-0.14
-0.14
POSITIVE LOGITS
table
0.81
tables
0.65
Table
0.62
-table
0.61
table
0.60
TABLE
0.60
_table
0.56
Table
0.55
TABLE
0.54
.table
0.53
Activations Density 0.151%